13 research outputs found

    Adaptive P2P platform for data sharing

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    On-premise containerized, light-weight software solutions for Biomedicine

    Get PDF
    Bioinformatics software systems are critical tools for analysing large-scale biological data, but their design and implementation can be challenging due to the need for reliability, scalability, and performance. This thesis investigates the impact of several software approaches on the design and implementation of bioinformatics software systems. These approaches include software patterns, microservices, distributed computing, containerisation and container orchestration. The research focuses on understanding how these techniques affect bioinformatics software systems’ reliability, scalability, performance, and efficiency. Furthermore, this research highlights the challenges and considerations involved in their implementation. This study also examines potential solutions for implementing container orchestration in bioinformatics research teams with limited resources and the challenges of using container orchestration. Additionally, the thesis considers microservices and distributed computing and how these can be optimised in the design and implementation process to enhance the productivity and performance of bioinformatics software systems. The research was conducted using a combination of software development, experimentation, and evaluation. The results show that implementing software patterns can significantly improve the code accessibility and structure of bioinformatics software systems. Specifically, microservices and containerisation also enhanced system reliability, scalability, and performance. Additionally, the study indicates that adopting advanced software engineering practices, such as model-driven design and container orchestration, can facilitate efficient and productive deployment and management of bioinformatics software systems, even for researchers with limited resources. Overall, we develop a software system integrating all our findings. Our proposed system demonstrated the ability to address challenges in bioinformatics. The thesis makes several key contributions in addressing the research questions surrounding the design, implementation, and optimisation of bioinformatics software systems using software patterns, microservices, containerisation, and advanced software engineering principles and practices. Our findings suggest that incorporating these technologies can significantly improve bioinformatics software systems’ reliability, scalability, performance, efficiency, and productivity.Bioinformatische Software-Systeme stellen bedeutende Werkzeuge für die Analyse umfangreicher biologischer Daten dar. Ihre Entwicklung und Implementierung kann jedoch aufgrund der erforderlichen Zuverlässigkeit, Skalierbarkeit und Leistungsfähigkeit eine Herausforderung darstellen. Das Ziel dieser Arbeit ist es, die Auswirkungen von Software-Mustern, Microservices, verteilten Systemen, Containerisierung und Container-Orchestrierung auf die Architektur und Implementierung von bioinformatischen Software-Systemen zu untersuchen. Die Forschung konzentriert sich darauf, zu verstehen, wie sich diese Techniken auf die Zuverlässigkeit, Skalierbarkeit, Leistungsfähigkeit und Effizienz von bioinformatischen Software-Systemen auswirken und welche Herausforderungen mit ihrer Konzeptualisierungen und Implementierung verbunden sind. Diese Arbeit untersucht auch potenzielle Lösungen zur Implementierung von Container-Orchestrierung in bioinformatischen Forschungsteams mit begrenzten Ressourcen und die Einschränkungen bei deren Verwendung in diesem Kontext. Des Weiteren werden die Schlüsselfaktoren, die den Erfolg von bioinformatischen Software-Systemen mit Containerisierung, Microservices und verteiltem Computing beeinflussen, untersucht und wie diese im Design- und Implementierungsprozess optimiert werden können, um die Produktivität und Leistung bioinformatischer Software-Systeme zu steigern. Die vorliegende Arbeit wurde mittels einer Kombination aus Software-Entwicklung, Experimenten und Evaluation durchgeführt. Die erzielten Ergebnisse zeigen, dass die Implementierung von Software-Mustern, die Zuverlässigkeit und Skalierbarkeit von bioinformatischen Software-Systemen erheblich verbessern kann. Der Einsatz von Microservices und Containerisierung trug ebenfalls zur Steigerung der Zuverlässigkeit, Skalierbarkeit und Leistungsfähigkeit des Systems bei. Darüber hinaus legt die Arbeit dar, dass die Anwendung von SoftwareEngineering-Praktiken, wie modellgesteuertem Design und Container-Orchestrierung, die effiziente und produktive Bereitstellung und Verwaltung von bioinformatischen Software-Systemen erleichtern kann. Zudem löst die Implementierung dieses SoftwareSystems, Herausforderungen für Forschungsgruppen mit begrenzten Ressourcen. Insgesamt hat das System gezeigt, dass es in der Lage ist, Herausforderungen im Bereich der Bioinformatik zu bewältigen und stellt somit ein wertvolles Werkzeug für Forscher in diesem Bereich dar. Die vorliegende Arbeit leistet mehrere wichtige Beiträge zur Beantwortung von Forschungsfragen im Zusammenhang mit dem Entwurf, der Implementierung und der Optimierung von Software-Systemen für die Bioinformatik unter Verwendung von Prinzipien und Praktiken der Softwaretechnik. Unsere Ergebnisse deuten darauf hin, dass die Einbindung dieser Technologien die Zuverlässigkeit, Skalierbarkeit, Leistungsfähigkeit, Effizienz und Produktivität bioinformatischer Software-Systeme erheblich verbessern kann

    Accelerating Event Stream Processing in On- and Offline Systems

    Get PDF
    Due to a growing number of data producers and their ever-increasing data volume, the ability to ingest, analyze, and store potentially never-ending streams of data is a mission-critical task in today's data processing landscape. A widespread form of data streams are event streams, which consist of continuously arriving notifications about some real-world phenomena. For example, a temperature sensor naturally generates an event stream by periodically measuring the temperature and reporting it with measurement time in case of a substantial change to the previous measurement. In this thesis, we consider two kinds of event stream processing: online and offline. Online refers to processing events solely in main memory as soon as they arrive, while offline means processing event data previously persisted to non-volatile storage. Both modes are supported by widely used scale-out general-purpose stream processing engines (SPEs) like Apache Flink or Spark Streaming. However, such engines suffer from two significant deficiencies that severely limit their processing performance. First, for offline processing, they load the entire stream from non-volatile secondary storage and replay all data items into the associated online engine in order of their original arrival. While this naturally ensures unified query semantics for on- and offline processing, the costs for reading the entire stream from non-volatile storage quickly dominate the overall processing costs. Second, modern SPEs focus on scaling out computations across the nodes of a cluster, but use only a fraction of the available resources of individual nodes. This thesis tackles those problems with three different approaches. First, we present novel techniques for the offline processing of two important query types (windowed aggregation and sequential pattern matching). Our methods utilize well-understood indexing techniques to reduce the total amount of data to read from non-volatile storage. We show that this improves the overall query runtime significantly. In particular, this thesis develops the first index-based algorithms for pattern queries expressed with the Match_Recognize clause, a new and powerful language feature of SQL that has received little attention so far. Second, we show how to maximize resource utilization of single nodes by exploiting the capabilities of modern hardware. Therefore, we develop a prototypical shared-memory CPU-GPU-enabled event processing system. The system provides implementations of all major event processing operators (filtering, windowed aggregation, windowed join, and sequential pattern matching). Our experiments reveal that regarding resource utilization and processing throughput, such a hardware-enabled system is superior to hardware-agnostic general-purpose engines. Finally, we present TPStream, a new operator for pattern matching over temporal intervals. TPStream achieves low processing latency and, in contrast to sequential pattern matching, is easily parallelizable even for unpartitioned input streams. This results in maximized resource utilization, especially for modern CPUs with multiple cores

    Urban Informatics

    Get PDF
    This open access book is the first to systematically introduce the principles of urban informatics and its application to every aspect of the city that involves its functioning, control, management, and future planning. It introduces new models and tools being developed to understand and implement these technologies that enable cities to function more efficiently – to become ‘smart’ and ‘sustainable’. The smart city has quickly emerged as computers have become ever smaller to the point where they can be embedded into the very fabric of the city, as well as being central to new ways in which the population can communicate and act. When cities are wired in this way, they have the potential to become sentient and responsive, generating massive streams of ‘big’ data in real time as well as providing immense opportunities for extracting new forms of urban data through crowdsourcing. This book offers a comprehensive review of the methods that form the core of urban informatics from various kinds of urban remote sensing to new approaches to machine learning and statistical modelling. It provides a detailed technical introduction to the wide array of tools information scientists need to develop the key urban analytics that are fundamental to learning about the smart city, and it outlines ways in which these tools can be used to inform design and policy so that cities can become more efficient with a greater concern for environment and equity

    Urban Informatics

    Get PDF
    This open access book is the first to systematically introduce the principles of urban informatics and its application to every aspect of the city that involves its functioning, control, management, and future planning. It introduces new models and tools being developed to understand and implement these technologies that enable cities to function more efficiently – to become ‘smart’ and ‘sustainable’. The smart city has quickly emerged as computers have become ever smaller to the point where they can be embedded into the very fabric of the city, as well as being central to new ways in which the population can communicate and act. When cities are wired in this way, they have the potential to become sentient and responsive, generating massive streams of ‘big’ data in real time as well as providing immense opportunities for extracting new forms of urban data through crowdsourcing. This book offers a comprehensive review of the methods that form the core of urban informatics from various kinds of urban remote sensing to new approaches to machine learning and statistical modelling. It provides a detailed technical introduction to the wide array of tools information scientists need to develop the key urban analytics that are fundamental to learning about the smart city, and it outlines ways in which these tools can be used to inform design and policy so that cities can become more efficient with a greater concern for environment and equity
    corecore