44 research outputs found

    A grid and cloud-based framework for high throughput bioinformatics

    Get PDF
    Recent advances in genome sequencing technologies have unleashed a flood of new data. As a result, the computational analysis of bioinformatics data sets has been rapidly moving from a labbased desktop computer environment to exhaustive analyses performed by large dedicated computing resources. Traditionally, large computational problems have been performed on dedicated clusters of high performance machines that are typically local to, and owned by, a particular institution. The current trend in Grid computing has seen institutions pooling their computational resources in order to offload excess computational work to remote locations during busy periods. In the last year or so, commercial Cloud computing initiatives have matured enough to offer a viable remote source of reliable computational power. Collections of idle desktop computers have also been used as a source of computational power in the form of ‘volunteer Grids’. The field of bioinformatics is highly dynamic, with new or updated versions of software tools and databases continually being developed. Several different tools and datasets must often be combined into a coherent, automated workflow or pipeline. While existing solutions are available for constructing workflows, there is a clear need for long-lived analyses consisting of many interconnected steps to be able to migrate among Grid and cloud computational resources dynamically. This project involved research into the principles underlying the design and architecture of flexible, high-throughput bioinformatics processes. Following extensive research into requirements gathering, a novel Grid-based platform, Microbase, has been implemented that is based on service-oriented architectures and peer-to-peer data transfer technology. This platform has been shown to be amenable to utilising a wide range of hardware from commodity desktop computers, to high-performance cloud infrastructure. The system has been shown to drastically reduce the bandwidth requirements of bioinformatics data distribution, and therefore reduces both the financial and computational costs associated with cloud computing. The system is inherently modular in nature, comprising a service based notification system, a data storage system scheduler and a job manager. In keeping with e-Science principles, each module can operate in physical isolation from each other, distributed within an intranet or Internet. Moreover, since each module is loosely coupled via Web services, modules have the potential to be used in combination with external service oriented components or in isolation as part of another system. In order to demonstrate the utility of such an open source system to the bioinformatics community, a pipeline of inter-connected bioinformatics applications was developed using the Microbase system to form a high throughput application for the comparative and visual analysis of microbial genomes. This application, Automated Genome Analyser (AGA) has been developed to operate without user interaction. AGA exposes its results via Web-services which can be used by further analytical stages within Microbase, by external computational resources via a Web service interface or which can be queried by users via an interactive genome browser. In addition to providing the necessary infrastructure for scalable Grid applications, a modular development framework has been provided, which simplifies the process of writing Grid applications. Microbase has been adopted by a number of projects ranging from comparative genomics to synthetic biology simulations.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Actas da 10ÂȘ ConferĂȘncia sobre Redes de Computadores

    Get PDF
    Universidade do MinhoCCTCCentro AlgoritmiCisco SystemsIEEE Portugal Sectio

    A P2P middleware design for digital access nodes in marginalised rural areas

    Get PDF
    This thesis addresses software design within the field of Information and Communications Technology for Development (ICTD). Specifically, it makes a case for the design and development of software which is custom-made for the context of marginalised rural areas (MRAs). One of the main aims of any ICTD project is sustainability and such sustainability is particularly difficult in MRAs because of the high costs of projects located there. Most literature on ICTD projects focuses on other factors, such as management, regulations, social and community issues when discussing this issue. Technical matters are often down-played or ignored entirely. This thesis argues that MRAs exhibit unique technical characteristics and that by understanding these characteristics, one can possibly design more cost-effective software. One specific characteristic is described and addressed in this thesis – a characteristic we describe here for the first time and call a network island. Further analysis of the literature generates a picture of a distributed network of access nodes (DANs) within such network islands, which are connected by high speed networks and are able to share resources and stimulate usage of technology by offering a wide range of services. This thesis attempts to design a fitting middleware platform for such a context, which would achieve the following aims: i) allow software developers to create solutions for the context more efficiently (correctly, rapidly); ii) stimulate product managers and business owners to create innovative software products more easily (cost-effectively). A given in the context of this thesis is that the software should use free/libre open source software (FLOSS) – good arguments do also exist for the use of FLOSS. A review of useful FLOSS frameworks is undertaken and several of these are examined in an applied part of the thesis, to see how useful they may be. They form the basis for a walking skeleton implementation of the proposed middleware. The Spring framework is the basis for experiments, along with Spring-Webservices, JMX and PHP 5’s web service capabilities. This thesis builds on three years of work at the Siyakhula Living Lab (SLL), an experimental testbed in a MRA in the Mbashe district of the Eastern Cape of South Africa. Several existing products are deployed at the SLL in the fields of eCommerce, eGovernment and eLearning. Requirements specifications are engineered from a variety of sources, including interviews, mailing lists, the author’s experience as a supervisor at the SLL, and a review of the existing SLL products. Future products are also investigated, as the thesis considers current trends in ICTD. Use cases are also derived and listed. Most of the use cases are concerned with management functions of DANs that can be automated, so that operators of DANs can focus on their core business and not on technology. Using the UML Components methodology, the thesis then proceeds to design a middleware component architecture that is derived from the requirements specification. The process proceeds step-by-step, so that the reader can follow how business rules, operations and interfaces are derived from the use cases. Ultimately, the business rules, interfaces and operations are related to business logic, system interfaces and operations that are situated in specific components. The components in turn are derived from the business information model, that is derived from the business concepts that were initially used to describe the context for the requirements engineering. In this way, a logical method for software design is applied to the problem domain to methodically derive a software design for a middleware solution. The thesis tests the design by considering possible weaknesses in the design. The network aspect is tested by interpolating from formal assumptions about the nature of the context. The data access layer is also identified as a possible bottleneck. We suggest the use of fast indexing methods instead of relational databases to maintain flexibility and efficiency of the data layer. Lessons learned from the exercise are discussed, within the context of the author’s experience in software development teams, as well as in ICTD projects. This synthesis of information leads to warnings about the psychology of middleware development. We note that the ICTD domain is a particularly difficult one with regards to software development as business requirements are not usually clearly formulated and developers do not have the requisite domain knowledge. In conclusion, the core arguments of the thesis are recounted in a bullet form, to lay bare the reasoning behind this work. Novel aspects of the work are also highlighted. They include the description of a network island, and aspects of the DAN middleware requirements engineering and design. Future steps for work based on this thesis are mapped out and open problems relating to this research are touched upon

    A study of mobile phone ad hoc networks via bluetooth with different routing protocols

    Get PDF
    The growth of mobile computing is changing the way people communicate. Mobile devices, especially mobile phones, have become cheaper and more powerful, and are able to run more applications and provide networking services. Mobile phones use fixed cellular infrastructure such as base stations and transmission towers to enable users to share multimedia content and access the internet at any time or place. However, using the internet is costly. Therefore, one of the solutions is to create impromptu ad hoc networks to share information among users. Such networks are infrastructureless and self-organising, much like mobile ad hoc networks. This dissertation therefore investigates how mobile phones with low-power Bluetooth technology can be used to create ad hoc networks that connect mobile phones and allow them to share information. The mobile phones should be able organise themselves for multi-hop communication. Routing becomes important in order to achieve effciency in data communication. Several existing routing protocols were developed and evaluated for this network to determine how effciently they deliver data and deal with network disruptions such as a device moving out of transmission range. Representative routing protocols in mobile ad hoc networking, peer-to-peer networks and publish/subscribe systems were evaluated according to performance metrics defidened in the research, namely total traffc, data traffc, control traffc, delay, convergence time, and positive response. Prototypes for Nokia phones were developed and tested in a small ad hoc network. For practical networking setup, a simple routing protocol that uses the limited mobile phone resources effciently would be better than a sophisticated routing protocol that keeps routing information about the network participants

    A heterogeneous mobile cloud computing model for hybrid clouds

    Get PDF
    Mobile cloud computing is a paradigm that delivers applications to mobile devices by using cloud computing. In this way, mobile cloud computing allows for a rich user experience; since client applications run remotely in the cloud infrastructure, applications use fewer resources in the user's mobile devices. In this paper, we present a new mobile cloud computing model, in which platforms of volunteer devices provide part of the resources of the cloud, inspired by both volunteer computing and mobile edge computing paradigms. These platforms may be hierarchical, based on the capabilities of the volunteer devices and the requirements of the services provided by the clouds. We also describe the orchestration between the volunteer platform and the public, private or hybrid clouds. As we show, this new model can be an inexpensive solution to different application scenarios, highlighting its benefits in cost savings, elasticity, scalability, load balancing, and efficiency. Moreover, with the evaluation performed we also show that our proposed model is a feasible solution for cloud services that have a large number of mobile users. (C) 2018 Elsevier B.V. All rights reserved.This work has been partially supported by the Spanish MINISTERIO DE ECONOMÍA Y COMPETITIVIDAD under the project grant TIN2016-79637-P TOWARDS UNIFICATION OF HPC AND BIG DATA PARADIGMS

    Virtual Machine Image Management for Elastic Resource Usage in Grid Computing

    Get PDF
    Grid Computing has evolved from an academic concept to a powerful paradigm in the area of high performance computing (HPC). Over the last few years, powerful Grid computing solutions were developed that allow the execution of computational tasks on distributed computing resources. Grid computing has recently attracted many commercial customers. To enable commercial customers to be able to execute sensitive data in the Grid, strong security mechanisms must be put in place to secure the customers' data. In contrast, the development of Cloud Computing, which entered the scene in 2006, was driven by industry: it was designed with respect to security from the beginning. Virtualization technology is used to separate the users e.g., by putting the different users of a system inside a virtual machine, which prevents them from accessing other users' data. The use of virtualization in the context of Grid computing has been examined early and was found to be a promising approach to counter the security threats that have appeared with commercial customers. One main part of the work presented in this thesis is the Image Creation Station (ICS), a component which allows users to administer their virtual execution environments (virtual machines) themselves and which is responsible for managing and distributing the virtual machines in the entire system. In contrast to Cloud computing, which was designed to allow even inexperienced users to execute their computational tasks in the Cloud easily, Grid computing is much more complex to use. The ICS makes it easier to use the Grid by overcoming traditional limitations like installing needed software on the compute nodes that users use to execute the computational tasks. This allows users to bring commercial software to the Grid for the first time, without the need for local administrators to install the software to computing nodes that are accessible by all users. Moreover, the administrative burden is shifted from the local Grid site's administrator to the users or experienced software providers that allow the provision of individually tailored virtual machines to each user. But the ICS is not only responsible for enabling users to manage their virtual machines themselves, it also ensures that the virtual machines are available on every site that is part of the distributed Grid system. A second aspect of the presented solution focuses on the elasticity of the system by automatically acquiring free external resources depending on the system's current workload. In contrast to existing systems, the presented approach allows the system's administrator to add or remove resource sets during runtime without needing to restart the entire system. Moreover, the presented solution allows users to not only use existing Grid resources but allows them to scale out to Cloud resources and use these resources on-demand. By ensuring that unused resources are shut down as soon as possible, the computational costs of a given task are minimized. In addition, the presented solution allows each user to specify which resources can be used to execute a particular job. This is useful when a job processes sensitive data e.g., that is not allowed to leave the company. To obtain a comparable function in today's systems, a user must submit her computational task to a particular resource set, losing the ability to automatically schedule if more than one set of resources can be used. In addition, the proposed solution prioritizes each set of resources by taking different metrics into account (e.g. the level of trust or computational costs) and tries to schedule the job to resources with the highest priority first. It is notable that the priority often mimics the physical distance from the resources to the user: a locally available Cluster usually has a higher priority due to the high level of trust and the computational costs, that are usually lower than the costs of using Cloud resources. Therefore, this scheduling strategy minimizes the costs of job execution by improving security at the same time since data is not necessarily transferred to remote resources and the probability of attacks by malicious external users is minimized. Bringing both components together results in a system that adapts automatically to the current workload by using external (e.g., Cloud) resources together with existing locally available resources or Grid sites and provides individually tailored virtual execution environments to the system's users

    Service-oriented models for audiovisual content storage

    No full text
    What are the important topics to understand if involved with storage services to hold digital audiovisual content? This report takes a look at how content is created and moves into and out of storage; the storage service value networks and architectures found now and expected in the future; what sort of data transfer is expected to and from an audiovisual archive; what transfer protocols to use; and a summary of security and interface issues

    Modeling, simulations, and experiments to balance performance and fairness in P2P file-sharing systems

    Get PDF
    Doctor of PhilosophyDepartment of Electrical and Computer EngineeringDon GruenbacherCaterina ScoglioIn this dissertation, we investigate research gaps still existing in P2P file-sharing systems: the necessity of fairness maintenance during the content information publishing/retrieving process, and the stranger policies on P2P fairness. First, through a wide range of measurements in the KAD network, we present the impact of a poorly designed incentive fairness policy on the performance of looking up content information. The KAD network, designed to help peers publish and retrieve sharing information, adopts a distributed hash table (DHT) technology and combines itself into the aMule/eMule P2P file-sharing network. We develop a distributed measurement framework that employs multiple test nodes running on the PlanetLab testbed. During the measurements, the routing tables of around 20,000 peers are crawled and analyzed. More than 3,000,000 pieces of source location information from the publishing tables of multiple peers are retrieved and contacted. Based on these measurements, we show that the routing table is well maintained, while the maintenance policy for the source-location-information publishing table is not well designed. Both the current maintenance schedule for the publishing table and the poor incentive policy on publishing peers eventually result in the low availability of the publishing table, which accordingly cause low lookup performance of the KAD network. Moreover, we propose three possible solutions to address these issues: the self-maintenance scheme with short period renewal interval, the chunk-based publishing/retrieving scheme, and the fairness scheme. Second, using both numerical analyses and agent-based simulations, we evaluate the impact of different stranger policies on system performance and fairness. We explore that the extremely restricting stranger policy brings the best fairness at a cost of performance degradation. The varying tendency of performance and fairness under different stranger policies are not consistent. A trade-off exists between controlling free-riding and maintaining system performance. Thus, P2P designers are required to tackle strangers carefully according to their individual design goals. We also show that BitTorrent prefers to maintain fairness with an extremely restricting stranger policy, while aMule/eMule’s fully rewarding stranger policy promotes free-riders’ benefit
    corecore