243 research outputs found

    Architectural Refactoring for Fast and Modular Bioinformatics Sequence Search

    Get PDF
    Bioinformaticists use the Basic Local Alignment Search Tool (BLAST) to characterize an unknown sequence by comparing it against a database of known sequences, thus detecting evolutionary relationships and biological properties. mpiBLAST is a widely-used, high-performance, open-source parallelization of BLAST that runs on a computer cluster delivering super-linear speedups. However, the Achilles heel of mpiBLAST is its lack of modularity, adversely affecting maintainability and extensibility; an effective architectural refactoring will benefit both users and developers. This paper describes our experiences in the architectural refactoring of mpiBLAST into a modular, high-performance software package. Our evaluation of five component-oriented designs culminated in a design that enables modularity while retaining high-performance. Furthermore, we achieved this refactoring effectively and efficiently using eXtreme Programming techniques. These experiences will be of value to software engineers faced with the challenge of creating maintainable and extensible, high-performance, bioinformatics software

    On-premise containerized, light-weight software solutions for Biomedicine

    Get PDF
    Bioinformatics software systems are critical tools for analysing large-scale biological data, but their design and implementation can be challenging due to the need for reliability, scalability, and performance. This thesis investigates the impact of several software approaches on the design and implementation of bioinformatics software systems. These approaches include software patterns, microservices, distributed computing, containerisation and container orchestration. The research focuses on understanding how these techniques affect bioinformatics software systems’ reliability, scalability, performance, and efficiency. Furthermore, this research highlights the challenges and considerations involved in their implementation. This study also examines potential solutions for implementing container orchestration in bioinformatics research teams with limited resources and the challenges of using container orchestration. Additionally, the thesis considers microservices and distributed computing and how these can be optimised in the design and implementation process to enhance the productivity and performance of bioinformatics software systems. The research was conducted using a combination of software development, experimentation, and evaluation. The results show that implementing software patterns can significantly improve the code accessibility and structure of bioinformatics software systems. Specifically, microservices and containerisation also enhanced system reliability, scalability, and performance. Additionally, the study indicates that adopting advanced software engineering practices, such as model-driven design and container orchestration, can facilitate efficient and productive deployment and management of bioinformatics software systems, even for researchers with limited resources. Overall, we develop a software system integrating all our findings. Our proposed system demonstrated the ability to address challenges in bioinformatics. The thesis makes several key contributions in addressing the research questions surrounding the design, implementation, and optimisation of bioinformatics software systems using software patterns, microservices, containerisation, and advanced software engineering principles and practices. Our findings suggest that incorporating these technologies can significantly improve bioinformatics software systems’ reliability, scalability, performance, efficiency, and productivity.Bioinformatische Software-Systeme stellen bedeutende Werkzeuge fĂŒr die Analyse umfangreicher biologischer Daten dar. Ihre Entwicklung und Implementierung kann jedoch aufgrund der erforderlichen ZuverlĂ€ssigkeit, Skalierbarkeit und LeistungsfĂ€higkeit eine Herausforderung darstellen. Das Ziel dieser Arbeit ist es, die Auswirkungen von Software-Mustern, Microservices, verteilten Systemen, Containerisierung und Container-Orchestrierung auf die Architektur und Implementierung von bioinformatischen Software-Systemen zu untersuchen. Die Forschung konzentriert sich darauf, zu verstehen, wie sich diese Techniken auf die ZuverlĂ€ssigkeit, Skalierbarkeit, LeistungsfĂ€higkeit und Effizienz von bioinformatischen Software-Systemen auswirken und welche Herausforderungen mit ihrer Konzeptualisierungen und Implementierung verbunden sind. Diese Arbeit untersucht auch potenzielle Lösungen zur Implementierung von Container-Orchestrierung in bioinformatischen Forschungsteams mit begrenzten Ressourcen und die EinschrĂ€nkungen bei deren Verwendung in diesem Kontext. Des Weiteren werden die SchlĂŒsselfaktoren, die den Erfolg von bioinformatischen Software-Systemen mit Containerisierung, Microservices und verteiltem Computing beeinflussen, untersucht und wie diese im Design- und Implementierungsprozess optimiert werden können, um die ProduktivitĂ€t und Leistung bioinformatischer Software-Systeme zu steigern. Die vorliegende Arbeit wurde mittels einer Kombination aus Software-Entwicklung, Experimenten und Evaluation durchgefĂŒhrt. Die erzielten Ergebnisse zeigen, dass die Implementierung von Software-Mustern, die ZuverlĂ€ssigkeit und Skalierbarkeit von bioinformatischen Software-Systemen erheblich verbessern kann. Der Einsatz von Microservices und Containerisierung trug ebenfalls zur Steigerung der ZuverlĂ€ssigkeit, Skalierbarkeit und LeistungsfĂ€higkeit des Systems bei. DarĂŒber hinaus legt die Arbeit dar, dass die Anwendung von SoftwareEngineering-Praktiken, wie modellgesteuertem Design und Container-Orchestrierung, die effiziente und produktive Bereitstellung und Verwaltung von bioinformatischen Software-Systemen erleichtern kann. Zudem löst die Implementierung dieses SoftwareSystems, Herausforderungen fĂŒr Forschungsgruppen mit begrenzten Ressourcen. Insgesamt hat das System gezeigt, dass es in der Lage ist, Herausforderungen im Bereich der Bioinformatik zu bewĂ€ltigen und stellt somit ein wertvolles Werkzeug fĂŒr Forscher in diesem Bereich dar. Die vorliegende Arbeit leistet mehrere wichtige BeitrĂ€ge zur Beantwortung von Forschungsfragen im Zusammenhang mit dem Entwurf, der Implementierung und der Optimierung von Software-Systemen fĂŒr die Bioinformatik unter Verwendung von Prinzipien und Praktiken der Softwaretechnik. Unsere Ergebnisse deuten darauf hin, dass die Einbindung dieser Technologien die ZuverlĂ€ssigkeit, Skalierbarkeit, LeistungsfĂ€higkeit, Effizienz und ProduktivitĂ€t bioinformatischer Software-Systeme erheblich verbessern kann

    CU2CL: A CUDA-to-OpenCL Translator for Multi- and Many-core Architectures

    Get PDF
    The use of graphics processing units (GPUs) in high-performance parallel computing continues to become more prevalent, often as part of a heterogeneous system. For years, CUDA has been the de facto programming environment for nearly all general-purpose GPU (GPGPU) applications. In spite of this, the framework is available only on NVIDIA GPUs, traditionally requiring reimplementation in other frameworks in order to utilize additional multi- or many-core devices. On the other hand, OpenCL provides an open and vendorneutral programming environment and runtime system. With implementations available for CPUs, GPUs, and other types of accelerators, OpenCL therefore holds the promise of a “write once, run anywhere” ecosystem for heterogeneous computing. Given the many similarities between CUDA and OpenCL, manually porting a CUDA application to OpenCL is typically straightforward, albeit tedious and error-prone. In response to this issue, we created CU2CL, an automated CUDA-to- OpenCL source-to-source translator that possesses a novel design and clever reuse of the Clang compiler framework. Currently, the CU2CL translator covers the primary constructs found in CUDA runtime API, and we have successfully translated many applications from the CUDA SDK and Rodinia benchmark suite. The performance of our automatically translated applications via CU2CL is on par with their manually ported countparts

    An Iteration on the Horizon Simulation Framework to Include .NET and Python Scripting

    Get PDF
    Modeling and Simulation is a crucial element of the aerospace engineering design pro- cess because it allows designers to thoroughly test their solution before investing in the resources to create it. The Horizon Simulation Framework (HSF) v3.0 is an aerospace modeling and simulation tool that allows the user to verify system level requirements in the early phases of the design process. A low fidelity model of the system that is created by the user is exhaustively tested within the built-in Day-in-the-Life simulator to provide useful information in the form of failed requirements, system bottle necks and leverage points, and potential schedules of operations. The model can be stood up quickly with Extended Markup Language (XML) input files or can be customly created with Python Scripts that interact with the framework at runtime. The goal of the work presented in this thesis is to progress HSF from v2.3 to v3.0 in order to take advantage of current software development technologies. This includes converting the codebase from C++ and Lua scripting to C♯ and Python Scripting. The particulars of the considerations, benefits, and implementation of the new framework are discussed in detail. The simulation data and performance run time of the new framework were compared to that of the old framework. The new framework was found to produce similar data outputs with a faster run time

    Visualization and exploration of next-generation proteomics data

    Get PDF

    The CCP4 suite:integrative software for macromolecular crystallography

    Get PDF

    Getting More out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics.

    Get PDF
    This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/ outcome models in the UK’s largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors’ own group) who work in text processing for biomedicine and other areas. GATE is available online ,1. under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis

    Design a market hub platform for utilities

    Get PDF
    Dissertação de mestrado em Computer ScienceSoftware Engineering has been contributing, over the years, to a better and more efficient production of software. Through its methodologies and processes, it has been used to increase the assurance that the software produced is robust, of quality, easy to update, and above all that, conforms to the requirements identified by the stakeholders. With the growth of data sharing, collection and storage in the utilities sector, there is also an urgent need to maintain and use the available information in a useful way. This requires the adoption of strategies and infrastructures that can help leveraging this form of treatment to make it possible to improve the quality of certain utility sectors (gas, electricity, water, internet, communications). However, much of this process is still nowadays controlled solely by the production and distribution companies, preventing the all other users to participate in the process. To try to undo the supremacy of the production and distribution companies have on the utilities panorama, and in support of the European Commission vision, the energy sector is trying to move towards a liberalized market and, with this, it aims to enable all entities such as consumers, retailers, producers, distributors, to contribute to the management of the network and for new business models to emerge. To sustain this ecosystem, where these entities can communicate and share data, it would be advantageous to have a platform that would allow the communication of such data between all users. In this project, and through the application of SE techniques, we will develop, step by step, a model for a modular, scalable and integrated environment to enable demand response, data exploration, storage and fulfill all the European Union-wide Data Protection Regulation (GDPR).A Engenharia de Software tem contribuĂ­do, ao longo dos anos, para uma melhor e eficiente produção de software. AtravĂ©s de metodologias e processos, Ă© possĂ­vel produzir software produzido robusto, com qualidade, confiĂĄvel, que possa ser atualizado e, acima de tudo, que respeite os requisitos identificados pelas partes interessadas. Com o crescimento da partilha, coleta e armazenamento de dados no setor de serviços pĂșblicos, existe uma necessidade urgente de manter e usar as informaçÔes disponĂ­veis de maneira Ăștil, com que se consiga extrair conhecimento. Isso requer a adoção de estratĂ©gias e infraestruturas que possam apoiar essa forma de tratamento para que, a partir da coleta de informaçÔes, seja possĂ­vel melhorar a qualidade de certos setores de serviços pĂșblicos (gĂĄs, eletricidade, ĂĄgua, internet, comunicaçÔes). No entanto, e nos tempos que correm, grande parte desses processos Ă© controlada exclusivamente pelas empresas de produção e distribuição, impedindo que os demais utilizadores participem do processo. De forma desfazer a supremacia que as empresas de produção e distribuição tĂȘm, e com apoio e motivação da ComissĂŁo Europeia, o setor energĂ©tico tem vindo a tentar implementar um mercado de energia liberalizado e, com isso, permitir que todas as entidades como, consumidores, retalhistas, produtores, distribuidores, possam contribuir para a gestĂŁo da rede e na criação de novos modelos de negĂłcio. Para sustentar este ecossistema, onde todas as entidades podem comunicar e compartilhar dados, seria Ăștil e vantajoso ter uma plataforma que permitisse a comunicação de tais dados entre todos os utilizadores. Neste projeto, e atravĂ©s da aplicação de tĂ©cnicas de SE, pretenderemos mostrar passo a passo um mĂ©todo para construir um ambiente modular, escalĂĄvel e integrado para permitir resposta a procura, exploração dos dados, armazenamento e que cumpra a nova lei da UniĂŁo Europeia respeitante Ă  Proteção de Dados (GDPR).This work is financed by the ERDF – European Regional Development Fund through the Operational Programme for Competitiveness and Internationalisation - COMPETE 2020 Programme within project «POCI-01-0145-FEDER-006961», and by National Funds through the Portuguese funding agency, FCT - Fundação para a CiĂȘncia e a Tecnologia as part of Project «UID/EEA/50014/2013»

    3rd EGEE User Forum

    Get PDF
    We have organized this book in a sequence of chapters, each chapter associated with an application or technical theme introduced by an overview of the contents, and a summary of the main conclusions coming from the Forum for the chapter topic. The first chapter gathers all the plenary session keynote addresses, and following this there is a sequence of chapters covering the application flavoured sessions. These are followed by chapters with the flavour of Computer Science and Grid Technology. The final chapter covers the important number of practical demonstrations and posters exhibited at the Forum. Much of the work presented has a direct link to specific areas of Science, and so we have created a Science Index, presented below. In addition, at the end of this book, we provide a complete list of the institutes and countries involved in the User Forum
    • 

    corecore