110 research outputs found

    Managing cache for efficient query processing

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Adaptive and secured resource management in distributed and Internet systems

    Get PDF
    The effectiveness of computer system resource management has been always determined by two major factors: (1) workload demands and management objectives, (2) the updates of the computer technology. These two factors are dynamically changing, and resource management systems must be timely adaptive to the changes. This dissertation attempts to address several important and related resource management issues.;We first study memory system utilization in centralized servers by improving memory performance of sorting algorithms, which provides fundamental understanding on memory system organizations and its performance optimizations for data-intensive workloads. to reduce different types of cache misses, we restructure the mergesort and quicksort algorithms by integrating tiling, padding, and buffering techniques and by repartitioning the data set. Our study shows substantial performance improvements from our new methods.;We have further extended the work to improve load sharing for utilizing global memory resources in distributed systems. Aiming at reducing the memory resource contention caused by page faults and I/O activities, we have developed and examined load sharing policies by considering effective usage of global memory in addition to CPU load balancing in both homogeneous and heterogeneous clusters.;Extending our research from clusters to Internet systems, we have further investigated memory and storage utilizations in Web caching systems. We have proposed several novel management schemes to restructure and decentralize the existing caching system by exploiting data locality at different levels of the global memory hierarchy and by effectively sharing data objects among the clients and their proxy caches.;Data integrity and communication anonymity issues are raised from our decentralized Web caching system design, which are also security concerns for general peer-to-peer systems. We propose an integrity protocol to ensure data integrity, and several protocols to achieve mutual communication anonymity between an information requester and a provider.;The potential impact and contributions of this dissertation are briefly stated as follows: (1) two major research topics identified in this dissertation are fundamentally important for the growth and development of information technology, and will continue to be demanding topics for a long term. (2) Our proposed cache-effective sorting methods bridge a serious gap between analytical complexity of algorithms and their execution complexity in practice due to the increasingly deep memory hierarchy in computer systems. This approach can also be used to improve memory performance at different levels of the memory hierarchy, such as I/O and file systems. (3) Our load sharing principle of giving a high priority to the requests of data accesses in memory and I/Os timely adapts the technology changes and effectively responds to the increasing demand of data-intensive applications. (4) Our proposed decentralized Web caching framework and its resource management schemes present a comprehensive case study to examine the P2P model. Our results and experiences can be used for related and further studies in distributed computing. (5) The proposed data integrity and communication anonymity protocols address limits and weaknesses of existing ones, and place a solid foundation for us to continue our work in this important area

    Software Tool for Validation of Chromatographic Analytical Method

    Get PDF
    Tänapäeval toetuvad paljud valdkonnad erinevate ainete analüüsimiseks analüütilistele protseduuridele. Meditsiinivaldkonnas kasutatakse neid laborianalüüside tegemiseks. Farmaatsias kasutatakse neid ravimite aktiivsete komponentide ja nende koguste määramiseks ning defektide tuvastamiseks. Toitainetööstuses määratakse nende abil toitude ja nende koostisosade omadused. Analüütilist protseduuri võib vaadelda keemilise analüüsi algoritmina. Nende protseduuride suure populaarsuse tõttu on vajalik saada neid valideerida. Valideerimine tõestab, et analüütilise protseduuri poolt kirjeldatav keemiline analüüs on antud otstarbe jaoks mõistlik: et sellega saab vajalikku ühendit piisava täpsusega mõõta. Kahjuks teostatakse protseduuride valideerimine tänapäeval käsitsi analüütiliste keemikute poolt. Käsitsi valideerides võtab aga see palju aega ja on kerge teha vigu. Seega on vajalik analüütiliste keemikute töö hõlbustamiseks luua süsteeme, mis kindlustaks tulemuse korrektsust ja teeks kogu protsessi kergemaks. Tartu Ülikooli keemia instituut on tunnistanud selliste süsteemide vajalikust ja alustas ühe sellise süsteemi - ValChrom’i - arendust. See lõputöö hindab olemasolevate lahenduste tugevaid ja nõrki külgi ning räägib analüütiliste protseduuride valideerimiseks mõeldud veebirakenduse ValChrom implementatsioonist.Many industries rely on analytical procedures to analyze various substances. In the medical field they are used to perform laboratory analyzes. In the pharmaceutical industry they are used to determine and quantify the active component of a drug product as well as impurities. In the food industry they are used to identify the properties of foods and their ingredients. An analytical procedure can be assimilable to the algorithm of a chemical analysis. Due to their widespread use, analytical procedures must be validated. The validation process will prove that the chemical analysis described by the analytical procedure is judicious and fit for its intended use case. That is, the chemical analysis can accurately measure the compound it is supposed to measure. Sadly, that validation process, currently, is performed manually by analytical chemists. The completion of analytical procedure validation manually is tedious and potentially error-prone. Therefore, accessible systems that can assist analytical chemists during analytical procedure validation should be made available to them. These systems will not only ensure the consistency of the result but also alleviate the workload of analytical chemists. The Department of Chemistry of the University of Tartu has acknowledged the need of such systems and launched the implementation of one named ValChrom. This thesis highlights the implementation details of ValChrom – a web-based application for analytical procedure validation, after evaluating the strengths and shortcomings of existing similar software solutions

    Methods to enhance content distribution for very large scale online communities

    Get PDF
    The Internet has experienced an exponential growth in the last years, and its number of users far from decay keeps on growing. Popular Web 2.0 services such as Facebook, YouTube or Twitter among others sum millions of users and employ vast infrastructures deployed worldwide. The size of these infrastructures is getting huge in order to support such a massive number of users. This increment of the infrastructure size has brought new problems regarding scalability, power consumption, cooling, hardware lifetime, underutilization, investment recovery, etc. Owning this kind of infrastructures is not always affordable nor convenient. This could be a major handicap for starting projects with a humble budget whose success is based on reaching a large audience. However, current technologies might permit to deploy vast infrastructures reducing their cost. We refer to peer-to-peer networks and cloud computing. Peer-to-peer systems permit users to yield their own resources to distributed infrastructures. These systems have demonstrated to be a valuable choice capable of distributing vast amounts of data to large audiences with a minimal starting infrastructure. Nevertheless, aspects such as content availability cannot be controlled in these systems, whereas classic server infrastructures can improve this aspect. In the recent time, the cloud has been revealed as a promising paradigm for hosting horizontally scalable Web systems. The cloud offers elastic capabilities that permit to save costs by adapting the number of resources to the incoming demand. Additionally, the cloud makes accessible a vast amount of resources that may be employed on peak workloads. However, how to determine the amount of resources to use remains a challenge. In this thesis, we describe a hierarchical architecture that combines both: peer-to-peer and elastic server infrastructures in order to enhance content distribution. The peer-topeer infrastructure brings a scalable solution that reduces the workload in the servers, while the server infrastructure assures availability and reduces costs varying its size when necessary. We propose a distributed collaborative caching infrastructure that employs a clusterbased locality-aware self-organizing P2P system. This system, leverages collaborative data classification in order to improve content locality. Our evaluation demonstrates that incrementing data locality permits to improve data search while reducing traffic. We explore the utilization of elastic server infrastructures addressing three issues: system sizing, data grouping and content distribution. We propose novel multi-model techniques for hierarchical workload prediction. These predictions are employed to determine the system size and request distribution policies. Additionally, we propose novel techniques for adaptive control that permit to identify inaccurate models and redefine them. Our evaluation using traces extracted from real systems indicate that the utilization of a hierarchy of multiple models increases prediction accuracy. This hierarchy in conjunction with our adaptive control techniques increments the accuracy during unexpected workload variations. Finally, we demonstrate that locality-aware request distribution policies can take advantage of prediction models to adequate content distribution independently of the system size

    Distributed Computing in a Pandemic

    Get PDF
    The current COVID-19 global pandemic caused by the SARS-CoV-2 betacoronavirus has resulted in over a million deaths and is having a grave socio-economic impact, hence there is an urgency to find solutions to key research challenges. Much of this COVID-19 research depends on distributed computing. In this article, I review distributed architectures -- various types of clusters, grids and clouds -- that can be leveraged to perform these tasks at scale, at high-throughput, with a high degree of parallelism, and which can also be used to work collaboratively. High-performance computing (HPC) clusters will be used to carry out much of this work. Several bigdata processing tasks used in reducing the spread of SARS-CoV-2 require high-throughput approaches, and a variety of tools, which Hadoop and Spark offer, even using commodity hardware. Extremely large-scale COVID-19 research has also utilised some of the world's fastest supercomputers, such as IBM's SUMMIT -- for ensemble docking high-throughput screening against SARS-CoV-2 targets for drug-repurposing, and high-throughput gene analysis -- and Sentinel, an XPE-Cray based system used to explore natural products. Grid computing has facilitated the formation of the world's first Exascale grid computer. This has accelerated COVID-19 research in molecular dynamics simulations of SARS-CoV-2 spike protein interactions through massively-parallel computation and was performed with over 1 million volunteer computing devices using the Folding@home platform. Grids and clouds both can also be used for international collaboration by enabling access to important datasets and providing services that allow researchers to focus on research rather than on time-consuming data-management tasks

    Bandwidth management and monitoring for IP network traffic : an investigation

    Get PDF
    Bandwidth management is a topic which is often discussed, but on which relatively little work has been done with regard to compiling a comprehensive set of techniques and methods for managing traffic on a network. What work has been done has concentrated on higher end networks, rather than the low bandwidth links which are commonly available in South Africa and other areas outside the United States. With more organisations increasingly making use of the Internet on a daily basis, the demand for bandwidth is outstripping the ability of providers to upgrade their infrastructure. This resource is therefore in need of management. In addition, for Internet access to become economically viable for widespread use by schools, NGOs and other academic institutions, the associated costs need to be controlled. Bandwidth management not only impacts on direct cost control, but encompasses the process of engineering a network and network resources in order to ensure the provision of as optimal a service as possible. Included in this is the provision of user education. Software has been developed for the implementation of traffic quotas, dynamic firewalling and visualisation. The research investigates various methods for monitoring and management of IP traffic with particular applicability to low bandwidth links. Several forms of visualisation for the analysis of historical and near-realtime traffic data are also discussed, including the use of three-dimensional landscapes. A number of bandwidth management practices are proposed, and the advantages of their combination, and complementary use are highlighted. By implementing these suggested policies, a holistic approach can be taken to the issue of bandwidth management on Internet links

    Improving Application Performance in the Emerging Hyper-converged Infrastructure

    Get PDF
    University of Minnesota Ph.D. dissertation.April 2019. Major: Computer Science. Advisor: David Du. 1 computer file (PDF); viii, 118 pages.In today's world, the hyper-converged infrastructure is emerging as a new type of infrastructure. In the hyper-converged infrastructure, service providers deploy compute, network and storage services on inexpensive hardware rather than expensive proprietary hardware. It allows the service providers to customize the services they can provide by deploying applications in Virtual Machines (VMs) or containers. They can have controls on all resources including compute, network and storage. In this hyper-converged infrastructure, improving the application performance is an important issue. Throughout my Ph.D. research, I have been studying how to improve the performance of applications in the emerging hyper-converged infrastructure. I have been focusing on improving the performance of applications in VMs and in containers when accessing data, and how to improve the performance of applications in the networked storage environment. In the hyper-converged infrastructure, administrators can provide desktop services by deploying Virtual Desktop Infrastructure application (VDI) based on VMs. We first investigate how to identify storage requirements and determine how to meet such requirements with minimal storage resources for VDI application. We create a model to describe the behavior of VDI, and collect real VDI traces to populate this model. The model allows us to identify the storage requirements of VDI and determine the potential bottlenecks in storage. Based on this information, we can tell what capacity and minimum capability a storage system needs in order to support and satisfy a given VDI configuration. We show that our model can describe more fine-grained storage requirements of VDI compared with the rules of thumb which are currently used in industry. In the hyper-converged infrastructure, more and more applications are running in containers. We design and implement a system, called k8sES (k8s Enhanced Storage), that efficiently supports applications with various storage SLOs (Service Level Objectives) along with all other requirements deployed in the Kubernetes environment which is based on containers. Kubernetes (k8s) is a system for managing containerized applications across multiple hosts. The current storage support for containerized applications in k8s is limited. To satisfy users' SLOs, k8s administrators must manually configure storage in advance, and users must know the configurations and capabilities of different types of the provided storage. In k8sES, storage resources are dynamically allocated based on users' requirements. Given users' SLOs, k8sES will select the correct node and storage that can meet their requirements when scheduling applications. The storage allocation mechanism in k8sES also improves the storage utilization efficiency. In addition, we provide a tool to monitor the I/O activities of both applications and storage devices in Kubernetes. With the capabilities of controlling client, network and storage with hyper-convergence, we study how to coordinate different components along the I/O path to ensure latency SLOs for applications in the networked storage environment. We propose and implement JoiNS, a system trying to ensure latency SLOs for applications that access data on remote networked storage. JoiNS carefully considers all the components along the I/O path and controls them in a coordinated fashion. JoiNS has both global network and storage visibility with a logically centralized controller which keeps monitoring the status of each involved component. JoiNS coordinates these components and adjusts the priority of I/Os in each component based on the latency SLO, network and storage status, time estimation, and characteristics of each I/O request
    corecore