1,084 research outputs found

    Configuration Management of Distributed Systems over Unreliable and Hostile Networks

    Get PDF
    Economic incentives of large criminal profits and the threat of legal consequences have pushed criminals to continuously improve their malware, especially command and control channels. This thesis applied concepts from successful malware command and control to explore the survivability and resilience of benign configuration management systems. This work expands on existing stage models of malware life cycle to contribute a new model for identifying malware concepts applicable to benign configuration management. The Hidden Master architecture is a contribution to master-agent network communication. In the Hidden Master architecture, communication between master and agent is asynchronous and can operate trough intermediate nodes. This protects the master secret key, which gives full control of all computers participating in configuration management. Multiple improvements to idempotent configuration were proposed, including the definition of the minimal base resource dependency model, simplified resource revalidation and the use of imperative general purpose language for defining idempotent configuration. Following the constructive research approach, the improvements to configuration management were designed into two prototypes. This allowed validation in laboratory testing, in two case studies and in expert interviews. In laboratory testing, the Hidden Master prototype was more resilient than leading configuration management tools in high load and low memory conditions, and against packet loss and corruption. Only the research prototype was adaptable to a network without stable topology due to the asynchronous nature of the Hidden Master architecture. The main case study used the research prototype in a complex environment to deploy a multi-room, authenticated audiovisual system for a client of an organization deploying the configuration. The case studies indicated that imperative general purpose language can be used for idempotent configuration in real life, for defining new configurations in unexpected situations using the base resources, and abstracting those using standard language features; and that such a system seems easy to learn. Potential business benefits were identified and evaluated using individual semistructured expert interviews. Respondents agreed that the models and the Hidden Master architecture could reduce costs and risks, improve developer productivity and allow faster time-to-market. Protection of master secret keys and the reduced need for incident response were seen as key drivers for improved security. Low-cost geographic scaling and leveraging file serving capabilities of commodity servers were seen to improve scaling and resiliency. Respondents identified jurisdictional legal limitations to encryption and requirements for cloud operator auditing as factors potentially limiting the full use of some concepts

    Auditable and performant Byzantine consensus for permissioned ledgers

    Get PDF
    Permissioned ledgers allow users to execute transactions against a data store, and retain proof of their execution in a replicated ledger. Each replica verifies the transactions’ execution and ensures that, in perpetuity, a committed transaction cannot be removed from the ledger. Unfortunately, this is not guaranteed by today’s permissioned ledgers, which can be re-written if an arbitrary number of replicas collude. In addition, the transaction throughput of permissioned ledgers is low, hampering real-world deployments, by not taking advantage of multi-core CPUs and hardware accelerators. This thesis explores how permissioned ledgers and their consensus protocols can be made auditable in perpetuity; even when all replicas collude and re-write the ledger. It also addresses how Byzantine consensus protocols can be changed to increase the execution throughput of complex transactions. This thesis makes the following contributions: 1. Always auditable Byzantine consensus protocols. We present a permissioned ledger system that can assign blame to individual replicas regardless of how many of them misbehave. This is achieved by signing and storing consensus protocol messages in the ledger and providing clients with signed, universally-verifiable receipts. 2. Performant transaction execution with hardware accelerators. Next, we describe a cloud-based ML inference service that provides strong integrity guarantees, while staying compatible with current inference APIs. We change the Byzantine consensus protocol to execute machine learning (ML) inference computation on GPUs to optimize throughput and latency of ML inference computation. 3. Parallel transactions execution on multi-core CPUs. Finally, we introduce a permissioned ledger that executes transactions, in parallel, on multi-core CPUs. We separate the execution of transactions between the primary and secondary replicas. The primary replica executes transactions on multiple CPU cores and creates a dependency graph of the transactions that the backup replicas utilize to execute transactions in parallel.Open Acces

    ACiS: smart switches with application-level acceleration

    Full text link
    Network performance has contributed fundamentally to the growth of supercomputing over the past decades. In parallel, High Performance Computing (HPC) peak performance has depended, first, on ever faster/denser CPUs, and then, just on increasing density alone. As operating frequency, and now feature size, have levelled off, two new approaches are becoming central to achieving higher net performance: configurability and integration. Configurability enables hardware to map to the application, as well as vice versa. Integration enables system components that have generally been single function-e.g., a network to transport data—to have additional functionality, e.g., also to operate on that data. More generally, integration enables compute-everywhere: not just in CPU and accelerator, but also in network and, more specifically, the communication switches. In this thesis, we propose four novel methods of enhancing HPC performance through Advanced Computing in the Switch (ACiS). More specifically, we propose various flexible and application-aware accelerators that can be embedded into or attached to existing communication switches to improve the performance and scalability of HPC and Machine Learning (ML) applications. We follow a modular design discipline through introducing composable plugins to successively add ACiS capabilities. In the first work, we propose an inline accelerator to communication switches for user-definable collective operations. MPI collective operations can often be performance killers in HPC applications; we seek to solve this bottleneck by offloading them to reconfigurable hardware within the switch itself. We also introduce a novel mechanism that enables the hardware to support MPI communicators of arbitrary shape and that is scalable to very large systems. In the second work, we propose a look-aside accelerator for communication switches that is capable of processing packets at line-rate. Functions requiring loops and states are addressed in this method. The proposed in-switch accelerator is based on a RISC-V compatible Coarse Grained Reconfigurable Arrays (CGRAs). To facilitate usability, we have developed a framework to compile user-provided C/C++ codes to appropriate back-end instructions for configuring the accelerator. In the third work, we extend ACiS to support fused collectives and the combining of collectives with map operations. We observe that there is an opportunity of fusing communication (collectives) with computation. Since the computation can vary for different applications, ACiS support should be programmable in this method. In the fourth work, we propose that switches with ACiS support can control and manage the execution of applications, i.e., that the switch be an active device with decision-making capabilities. Switches have a central view of the network; they can collect telemetry information and monitor application behavior and then use this information for control, decision-making, and coordination of nodes. We evaluate the feasibility of ACiS through extensive RTL-based simulation as well as deployment in an open-access cloud infrastructure. Using this simulation framework, when considering a Graph Convolutional Network (GCN) application as a case study, a speedup of on average 3.4x across five real-world datasets is achieved on 24 nodes compared to a CPU cluster without ACiS capabilities

    Cybersecurity: Past, Present and Future

    Full text link
    The digital transformation has created a new digital space known as cyberspace. This new cyberspace has improved the workings of businesses, organizations, governments, society as a whole, and day to day life of an individual. With these improvements come new challenges, and one of the main challenges is security. The security of the new cyberspace is called cybersecurity. Cyberspace has created new technologies and environments such as cloud computing, smart devices, IoTs, and several others. To keep pace with these advancements in cyber technologies there is a need to expand research and develop new cybersecurity methods and tools to secure these domains and environments. This book is an effort to introduce the reader to the field of cybersecurity, highlight current issues and challenges, and provide future directions to mitigate or resolve them. The main specializations of cybersecurity covered in this book are software security, hardware security, the evolution of malware, biometrics, cyber intelligence, and cyber forensics. We must learn from the past, evolve our present and improve the future. Based on this objective, the book covers the past, present, and future of these main specializations of cybersecurity. The book also examines the upcoming areas of research in cyber intelligence, such as hybrid augmented and explainable artificial intelligence (AI). Human and AI collaboration can significantly increase the performance of a cybersecurity system. Interpreting and explaining machine learning models, i.e., explainable AI is an emerging field of study and has a lot of potentials to improve the role of AI in cybersecurity.Comment: Author's copy of the book published under ISBN: 978-620-4-74421-

    Hierarchical Relative Lempel-Ziv Compression

    Get PDF
    Relative Lempel-Ziv (RLZ) parsing is a dictionary compression method in which a string S is compressed relative to a second string R (called the reference) by parsing S into a sequence of substrings that occur in R. RLZ is particularly effective at compressing sets of strings that have a high degree of similarity to the reference string, such as a set of genomes of individuals from the same species. With the now cheap cost of DNA sequencing, such datasets have become extremely abundant and are rapidly growing. In this paper, instead of using a single reference string for the entire collection, we investigate the use of different reference strings for subsets of the collection, with the aim of improving compression. In particular, we propose a new compression scheme hierarchical relative Lempel-Ziv (HRLZ) which form a rooted tree (or hierarchy) on the strings and then compress each string using RLZ with parent as reference, storing only the root of the tree in plain text. To decompress, we traverse the tree in BFS order starting at the root, decompressing children with respect to their parent. We show that this approach leads to a twofold improvement in compression on bacterial genome datasets, with negligible effect on decompression time compared to the standard single reference approach. We show that an effective hierarchy for a given set of strings can be constructed by computing the optimal arborescence of a completed weighted digraph of the strings, with weights as the number of phrases in the RLZ parsing of the source and destination vertices. We further show that instead of computing the complete graph, a sparse graph derived using locality-sensitive hashing can significantly reduce the cost of computing a good hierarchy, without adversely effecting compression performance

    Modern data analytics in the cloud era

    Get PDF
    Cloud Computing ist die dominante Technologie des letzten Jahrzehnts. Die Benutzerfreundlichkeit der verwalteten Umgebung in Kombination mit einer nahezu unbegrenzten Menge an Ressourcen und einem nutzungsabhängigen Preismodell ermöglicht eine schnelle und kosteneffiziente Projektrealisierung für ein breites Nutzerspektrum. Cloud Computing verändert auch die Art und Weise wie Software entwickelt, bereitgestellt und genutzt wird. Diese Arbeit konzentriert sich auf Datenbanksysteme, die in der Cloud-Umgebung eingesetzt werden. Wir identifizieren drei Hauptinteraktionspunkte der Datenbank-Engine mit der Umgebung, die veränderte Anforderungen im Vergleich zu traditionellen On-Premise-Data-Warehouse-Lösungen aufweisen. Der erste Interaktionspunkt ist die Interaktion mit elastischen Ressourcen. Systeme in der Cloud sollten Elastizität unterstützen, um den Lastanforderungen zu entsprechen und dabei kosteneffizient zu sein. Wir stellen einen elastischen Skalierungsmechanismus für verteilte Datenbank-Engines vor, kombiniert mit einem Partitionsmanager, der einen Lastausgleich bietet und gleichzeitig die Neuzuweisung von Partitionen im Falle einer elastischen Skalierung minimiert. Darüber hinaus führen wir eine Strategie zum initialen Befüllen von Puffern ein, die es ermöglicht, skalierte Ressourcen unmittelbar nach der Skalierung auszunutzen. Cloudbasierte Systeme sind von fast überall aus zugänglich und verfügbar. Daten werden häufig von zahlreichen Endpunkten aus eingespeist, was sich von ETL-Pipelines in einer herkömmlichen Data-Warehouse-Lösung unterscheidet. Viele Benutzer verzichten auf die Definition von strikten Schemaanforderungen, um Transaktionsabbrüche aufgrund von Konflikten zu vermeiden oder um den Ladeprozess von Daten zu beschleunigen. Wir führen das Konzept der PatchIndexe ein, die die Definition von unscharfen Constraints ermöglichen. PatchIndexe verwalten Ausnahmen zu diesen Constraints, machen sie für die Optimierung und Ausführung von Anfragen nutzbar und bieten effiziente Unterstützung bei Datenaktualisierungen. Das Konzept kann auf beliebige Constraints angewendet werden und wir geben Beispiele für unscharfe Eindeutigkeits- und Sortierconstraints. Darüber hinaus zeigen wir, wie PatchIndexe genutzt werden können, um fortgeschrittene Constraints wie eine unscharfe Multi-Key-Partitionierung zu definieren, die eine robuste Anfrageperformance bei Workloads mit unterschiedlichen Partitionsanforderungen bietet. Der dritte Interaktionspunkt ist die Nutzerinteraktion. Datengetriebene Anwendungen haben sich in den letzten Jahren verändert. Neben den traditionellen SQL-Anfragen für Business Intelligence sind heute auch datenwissenschaftliche Anwendungen von großer Bedeutung. In diesen Fällen fungiert das Datenbanksystem oft nur als Datenlieferant, während der Rechenaufwand in dedizierten Data-Science- oder Machine-Learning-Umgebungen stattfindet. Wir verfolgen das Ziel, fortgeschrittene Analysen in Richtung der Datenbank-Engine zu verlagern und stellen das Grizzly-Framework als DataFrame-zu-SQL-Transpiler vor. Auf dieser Grundlage identifizieren wir benutzerdefinierte Funktionen (UDFs) und maschinelles Lernen (ML) als wichtige Aufgaben, die von einer tieferen Integration in die Datenbank-Engine profitieren würden. Daher untersuchen und bewerten wir Ansätze für die datenbankinterne Ausführung von Python-UDFs und datenbankinterne ML-Inferenz.Cloud computing has been the groundbreaking technology of the last decade. The ease-of-use of the managed environment in combination with nearly infinite amount of resources and a pay-per-use price model enables fast and cost-efficient project realization for a broad range of users. Cloud computing also changes the way software is designed, deployed and used. This thesis focuses on database systems deployed in the cloud environment. We identify three major interaction points of the database engine with the environment that show changed requirements compared to traditional on-premise data warehouse solutions. First, software is deployed on elastic resources. Consequently, systems should support elasticity in order to match workload requirements and be cost-effective. We present an elastic scaling mechanism for distributed database engines, combined with a partition manager that provides load balancing while minimizing partition reassignments in the case of elastic scaling. Furthermore we introduce a buffer pre-heating strategy that allows to mitigate a cold start after scaling and leads to an immediate performance benefit using scaling. Second, cloud based systems are accessible and available from nearly everywhere. Consequently, data is frequently ingested from numerous endpoints, which differs from bulk loads or ETL pipelines in a traditional data warehouse solution. Many users do not define database constraints in order to avoid transaction aborts due to conflicts or to speed up data ingestion. To mitigate this issue we introduce the concept of PatchIndexes, which allow the definition of approximate constraints. PatchIndexes maintain exceptions to constraints, make them usable in query optimization and execution and offer efficient update support. The concept can be applied to arbitrary constraints and we provide examples of approximate uniqueness and approximate sorting constraints. Moreover, we show how PatchIndexes can be exploited to define advanced constraints like an approximate multi-key partitioning, which offers robust query performance over workloads with different partition key requirements. Third, data-centric workloads changed over the last decade. Besides traditional SQL workloads for business intelligence, data science workloads are of significant importance nowadays. For these cases the database system might only act as data delivery, while the computational effort takes place in data science or machine learning (ML) environments. As this workflow has several drawbacks, we follow the goal of pushing advanced analytics towards the database engine and introduce the Grizzly framework as a DataFrame-to-SQL transpiler. Based on this we identify user-defined functions (UDFs) and machine learning inference as important tasks that would benefit from a deeper engine integration and investigate approaches to push these operations towards the database engine

    Design and Development of Biofeedback Stick Technology (BfT) to Improve the Quality of Life of Walking Stick Users

    Get PDF
    Biomedical engineering has seen a rapid growth in recent times, where the aim to facilitate and equip humans with the latest technology has become widespread globally. From high-tech equipment ranging from CT scanners, MRI equipment, and laser treatments, to the design, creation, and implementation of artificial body parts, the field of biomedical engineering has significantly contributed to mankind. Biomedical engineering has facilitated many of the latest developments surrounding human mobility, with advancement in mobility aids improving human movement for people with compromised mobility either caused by an injury or health condition. A review of the literature indicated that mobility aids, especially walking sticks, and appropriate training for their use, are generally prescribed by allied health professionals (AHP) to walking stick users for rehabilitation and activities of daily living (ADL). However, feedback from AHP is limited to the clinical environment, leaving walking stick users vulnerable to falls and injuries due to incorrect usage. Hence, to mitigate the risk of falls and injuries, and to facilitate a routine appraisal of individual patient’s usage, a simple, portable, robust, and reliable tool was developed which provides the walking stick users with real-time feedback upon incorrect usage during their activities of daily living (ADL). This thesis aimed to design and develop a smart walking stick technology: Biofeedback stick technology (BfT). The design incorporates the approach of patient and public involvement (PPI) in the development of BfT to ensure that BfT was developed as per the requirements of walking stick users and AHP recommendations. The newly developed system was tested quantitatively for; validity, reliability, and reproducibility against gold standard equipment such as the 3D motion capture system, force plates, optical measurement system for orientation, weight bearing, and step count. The system was also tested qualitatively for its usability by conducting semi-informal interviews with AHPs and walking stick users. The results of these studies showed that the newly developed system has good accuracy, reported above 95% with a maximum inaccuracy of 1°. The data reported indicates good reproducibility. The angles, weight, and steps recorded by the system during experiments are within the values published in the literature. From these studies, it was concluded that, BfT has the potential to improve the lives of walking stick users and that, with few additional improvements, appropriate approval from relevant regulatory bodies, and robust clinical testing, the technology has a huge potential to carve its way to a commercial market

    Geographic information extraction from texts

    Get PDF
    A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction
    • …
    corecore