607 research outputs found

    The 5th Conference of PhD Students in Computer Science

    Get PDF

    An Intermediate Data-driven Methodology for Scientific Workflow Management System to Support Reusability

    Get PDF
    Automatic processing of different logical sub-tasks by a set of rules is a workflow. A workflow management system (WfMS) is a system that helps us accomplish a complex scientific task through making a sequential arrangement of sub-tasks available as tools. Workflows are formed with modules from various domains in a WfMS, and many collaborators of the domains are involved in the workflow design process. Workflow Management Systems (WfMSs) have been gained popularity in recent years for managing various tools in a system and ensuring dependencies while building a sequence of executions for scientific analyses. As a result of heterogeneous tools involvement and collaboration requirement, Collaborative Scientific Workflow Management Systems (CSWfMS) have gained significant interest in the scientific analysis community. In such systems, big data explosion issues exist with massive velocity and variety characteristics for the heterogeneous large amount of data from different domains. Therefore a large amount of heterogeneous data need to be managed in a Scientific Workflow Management System (SWfMS) with a proper decision mechanism. Although a number of studies addressed the cost management of data, none of the existing studies are related to real- time decision mechanism or reusability mechanism. Besides, frequent execution of workflows in a SWfMS generates a massive amount of data and characteristics of such data are always incremental. Input data or module outcomes of a workflow in a SWfMS are usually large in size. Processing of such data-intensive workflows is usually time-consuming where modules are computationally expensive for their respective inputs. Besides, lack of data reusability, limitation of error recovery, inefficient workflow processing, inefficient storing of derived data, lacking in metadata association and lacking in validation of the effectiveness of a technique of existing systems need to be addressed in a SWfMS for efficient workflow building by maintaining the big data explosion. To address the issues, in this thesis first we propose an intermediate data management scheme for a SWfMS. In our second attempt, we explored the possibilities and introduced an automatic recommendation technique for a SWfMS from real-world workflow data (i.e Galaxy [1] workflows) where our investigations show that the proposed technique can facilitate 51% of workflow building in a SWfMS by reusing intermediate data of previous workflows and can reduce 74% execution time of workflow buildings in a SWfMS. Later we propose an adaptive version of our technique by considering the states of tools in a SWfMS, which shows around 40% reusability for workflows. Consequently, in our fourth study, We have done several experiments for analyzing the performance and exploring the effectiveness of the technique in a SWfMS for various environments. The technique is introduced to emphasize on storing cost reduction, increase data reusability, and faster workflow execution, to the best of our knowledge, which is the first of its kind. Detail architecture and evaluation of the technique are presented in this thesis. We believe our findings and developed system will contribute significantly to the research domain of SWfMSs

    A Software Agent for Adaptive Navigation Support in a Restricted Internet Area

    Get PDF
    This thesis deals with the development of a software system that helps a user to search for information in the World Wide Web. The particular problem considered here is support in a well-defined, restricted Web area. Two support strategies are considered. One strategy is to present a visitor views of a local hyperlink structure depending on the current position in hyper-space and previous navigation decisions. Main partial problems to realize such a support are dealt with, like the registration of user behavior, the registration of information about the Web area and the presentation of support information on the client side. In contrast to similar systems, the developed system may be applied by a large fraction of Internet users instantly. The only requirement on the client side is Java support by the browser. The second considered support strategy is an estimation of the pertinence of data objects and sequences in the Web for a specific client. This estimation is based on the client's previous navigation behavior and registered navigation behavior of other users (collaborative filtering). The approach to estimate relevant data objects in this thesis is to predict a user's future data requests. For this purpose the presented system stores user information on theserver side. User behavior is modeled by graphs, consisting of nodes representing requested data objects and edges representing transitions. A new method is presented to predict future navigation steps that is based on a distribution estimation of registered graphs and a classification of a new (partial) navigation profile with regard to the estimated distribution. The different steps of the presented algorithm are evaluated using generated and observed profiles

    Cloud-efficient modelling and simulation of magnetic nano materials

    Get PDF
    Scientific simulations are rarely attempted in a cloud due to the substantial performance costs of virtualization. Considerable communication overheads, intolerable latencies, and inefficient hardware emulation are the main reasons why this emerging technology has not been fully exploited. On the other hand, the progress of computing infrastructure nowadays is strongly dependent on perspective storage medium development, where efficient micromagnetic simulations play a vital role in future memory design. This thesis addresses both these topics by merging micromagnetic simulations with the latest OpenStack cloud implementation while providing a time and costeffective alternative to expensive computing centers. However, many challenges have to be addressed before a high-performance cloud platform emerges as a solution for problems in micromagnetic research communities. First, the best solver candidate has to be selected and further improved, particularly in the parallelization and process communication domain. Second, a 3-level cloud communication hierarchy needs to be recognized and each segment adequately addressed. The required steps include breaking the VMisolation for the host’s shared memory activation, cloud network-stack tuning, optimization, and efficient communication hardware integration. The project work concludes with practical measurements and confirmation of successfully implemented simulation into an open-source cloud environment. It is achieved that the renewed Magpar solver runs for the first time in the OpenStack cloud by using ivshmem for shared memory communication. Also, extensive measurements proved the effectiveness of our solutions, yielding from sixty percent to over ten times better results than those achieved in the standard cloud.Aufgrund der erheblichen Leistungskosten der Virtualisierung werden wissenschaftliche Simulationen in einer Cloud selten versucht. Beträchtlicher Kommunikationsaufwand, erhebliche Latenzen und ineffiziente Hardwareemulation sind die Hauptgründe, warum diese aufkommende Technologie nicht vollständig genutzt wurde. Andererseits hängt der Fortschritt der Computertechnologie heutzutage stark von der Entwicklung perspektivischer Speichermedien ab, bei denen effiziente mikromagnetische Simulationen eine wichtige Rolle für die zukünftige Speichertechnologie spielen. Diese Arbeit befasst sich mit diesen beiden Themen, indem mikromagnetische Simulationen mit der neuesten OpenStack Cloud-Implementierung zusammengeführt werden, um eine zeit- und kostengünstige Alternative zu teuren Rechenzentren bereitzustellen. Viele Herausforderungen müssen jedoch angegangen werden, bevor eine leistungsstarke Cloud-Plattform als Lösung für Probleme in mikromagnetischen Forschungsgemeinschaften entsteht. Zunächst muss der beste Kandidat für die Lösung ausgewählt und weiter verbessert werden, insbesondere im Bereich der Parallelisierung und Prozesskommunikation. Zweitens muss eine 3-stufige CloudKommunikationshierarchie erkannt und jedes Segment angemessen adressiert werden. Die erforderlichen Schritte umfassen das Aufheben der VM-Isolation, um den gemeinsam genutzten Speicher zwischen Cloud-Instanzen zu aktivieren, die Optimierung des Cloud-Netzwerkstapels und die effiziente Integration von Kommunikationshardware. Die praktische Arbeit endet mit Messungen und der Bestätigung einer erfolgreich implementierten Simulation in einer Open-Source Cloud-Umgebung. Als Ergebnis haben wir erreicht, dass der neu erstellte Magpar-Solver zum ersten Mal in der OpenStack Cloud ausgeführt wird, indem ivshmem für die Shared-Memory Kommunikation verwendet wird. Umfangreiche Messungen haben auch die Wirksamkeit unserer Lösungen bewiesen und von sechzig Prozent bis zu zehnmal besseren Ergebnissen als in der Standard Cloud geführt

    A Data-Analysis and Sensitivity-Optimization Framework for the KATRIN Experiment

    Get PDF
    Presently under construction, the Karlsruhe TRitium Neutrino (KATRIN) experiment is the next generation tritium beta-decay experiment to perform a direct kinematical measurement of the electron neutrino mass with an unprecedented sensitivity of 200 meV (90% C.L.). This thesis describes the implementation of a consistent data analysis framework, addressing technical aspects of the data taking process and statistical challenges of a neutrino mass estimation from the beta-decay electron spectrum

    The 7th Conference of PhD Students in Computer Science

    Get PDF

    Optimization for Networks and Object Recognition

    Get PDF
    The present thesis explores two different application areas of combinatorial optimization, the work presented, indeed, is two fold, since it deals with two distinct problems, one related to data transfer in networks and the other to object recognition. Caching is an essential technique to improve throughput and latency in a vast variety of applications. The core idea is to duplicate content in memories distributed across the network, which can then be exploited to deliver requested content with less congestion and delay. In particular, it has been shown that the use of caching together with smart offloading strategies in a RAN composed of evolved NodeBs (eNBs), AP (e.g., WiFi), and UEs, can significantly reduce the backhaul traffic and service latency. The traditional role of cache memories is to deliver the maximal amount of requested content locally rather than from a remote server. While this approach is optimal for single-cache systems, it has recently been shown to be, in general, significantly suboptimal for systems with multiple caches (i.e., cache networks) since it allows only additive caching gain, while instead, cache memories should be used to enable a multiplicative caching gain. Recent studies have shown that storing different portions of the content across the wireless network caches and capitalizing on the spatial reuse of device-to-device (D2D) communications, or exploiting globally cached information in order to multicast coded messages simultaneously useful to a large number of users, enables a global caching gain. We focus on the case of a single server (e.g., a base station) and multiple users, each of which caches segments of files in a finite library. Each user requests one (whole) file in the library and the server sends a common coded multicast message to satisfy all users at once. The problem consists of finding the smallest possible codeword length to satisfy such requests. To solve this problem we present two achievable caching and coded delivery scheme, and one correlation-aware caching scheme, each of them is based on a heuristic polynomial-time coloring algorithm. Automatic object recognition has become, over the last decades, a central toping the in the artificial intelligence research, with a a significant burt over the last new year with the advent of the deep learning paradigm. In this context, the objective of the work discussed in the last two chapter of this thesis is an attempt at improving the performance of a natural images classifier introducing in the loop knowledge coming from the real world, expressed in terms of probability of a set of spatial relations between the objects in the images. In different words, the framework presented in this work aims at integrating the output of standard classifiers on different image parts with some domain knowledge, encoded in a probabilistic ontology

    KBSET -- Knowledge-Based Support for Scholarly Editing and Text Processing with Declarative LaTeX Markup and a Core Written in SWI-Prolog

    Full text link
    KBSET is an environment that provides support for scholarly editing in two flavors: First, as a practical tool KBSET/Letters that accompanies the development of editions of correspondences (in particular from the 18th and 19th century), completely from source documents to PDF and HTML presentations. Second, as a prototypical tool KBSET/NER for experimentally investigating novel forms of working on editions that are centered around automated named entity recognition. KBSET can process declarative application-specific markup that is expressed in LaTeX notation and incorporate large external fact bases that are typically provided in RDF. KBSET includes specially developed LaTeX styles and a core system that is written in SWI-Prolog, which is used there in many roles, utilizing that it realizes the potential of Prolog as a unifying language.Comment: To appear in DECLARE 2019 Revised Selected Paper
    • …
    corecore