243 research outputs found
Architectural Refactoring for Fast and Modular Bioinformatics Sequence Search
Bioinformaticists use the Basic Local Alignment Search Tool (BLAST) to characterize an unknown sequence by
comparing it against a database of known sequences, thus detecting evolutionary relationships and biological properties. mpiBLAST is a widely-used, high-performance, open-source parallelization of BLAST that runs on a computer cluster delivering super-linear speedups. However, the Achilles heel of mpiBLAST is its lack of modularity, adversely affecting maintainability and extensibility; an effective architectural refactoring will benefit both users and developers.
This paper describes our experiences in the architectural refactoring of mpiBLAST into a modular, high-performance software package. Our evaluation of five component-oriented designs culminated in a design that enables modularity while retaining high-performance. Furthermore, we achieved this refactoring effectively and efficiently using eXtreme Programming techniques. These experiences will be of value to software engineers faced with the challenge of creating maintainable and extensible, high-performance, bioinformatics software
On-premise containerized, light-weight software solutions for Biomedicine
Bioinformatics software systems are critical tools for analysing large-scale biological
data, but their design and implementation can be challenging due to the need for reliability, scalability, and performance. This thesis investigates the impact of several
software approaches on the design and implementation of bioinformatics software
systems. These approaches include software patterns, microservices, distributed
computing, containerisation and container orchestration. The research focuses on
understanding how these techniques affect bioinformatics software systemsâ reliability, scalability, performance, and efficiency. Furthermore, this research highlights
the challenges and considerations involved in their implementation. This study also
examines potential solutions for implementing container orchestration in bioinformatics research teams with limited resources and the challenges of using container
orchestration. Additionally, the thesis considers microservices and distributed computing and how these can be optimised in the design and implementation process to
enhance the productivity and performance of bioinformatics software systems. The
research was conducted using a combination of software development, experimentation, and evaluation. The results show that implementing software patterns can
significantly improve the code accessibility and structure of bioinformatics software
systems. Specifically, microservices and containerisation also enhanced system reliability, scalability, and performance. Additionally, the study indicates that adopting
advanced software engineering practices, such as model-driven design and container
orchestration, can facilitate efficient and productive deployment and management of
bioinformatics software systems, even for researchers with limited resources. Overall, we develop a software system integrating all our findings. Our proposed system
demonstrated the ability to address challenges in bioinformatics. The thesis makes
several key contributions in addressing the research questions surrounding the design,
implementation, and optimisation of bioinformatics software systems using software
patterns, microservices, containerisation, and advanced software engineering principles and practices. Our findings suggest that incorporating these technologies can
significantly improve bioinformatics software systemsâ reliability, scalability, performance, efficiency, and productivity.Bioinformatische Software-Systeme stellen bedeutende Werkzeuge fĂŒr die Analyse
umfangreicher biologischer Daten dar. Ihre Entwicklung und Implementierung kann
jedoch aufgrund der erforderlichen ZuverlÀssigkeit, Skalierbarkeit und LeistungsfÀhigkeit eine Herausforderung darstellen. Das Ziel dieser Arbeit ist es, die Auswirkungen von Software-Mustern, Microservices, verteilten Systemen, Containerisierung
und Container-Orchestrierung auf die Architektur und Implementierung von bioinformatischen Software-Systemen zu untersuchen. Die Forschung konzentriert sich
darauf, zu verstehen, wie sich diese Techniken auf die ZuverlÀssigkeit, Skalierbarkeit,
LeistungsfÀhigkeit und Effizienz von bioinformatischen Software-Systemen auswirken
und welche Herausforderungen mit ihrer Konzeptualisierungen und Implementierung
verbunden sind. Diese Arbeit untersucht auch potenzielle Lösungen zur Implementierung von Container-Orchestrierung in bioinformatischen Forschungsteams mit begrenzten Ressourcen und die EinschrĂ€nkungen bei deren Verwendung in diesem Kontext. Des Weiteren werden die SchlĂŒsselfaktoren, die den Erfolg von bioinformatischen Software-Systemen mit Containerisierung, Microservices und verteiltem Computing beeinflussen, untersucht und wie diese im Design- und Implementierungsprozess optimiert werden können, um die ProduktivitĂ€t und Leistung bioinformatischer
Software-Systeme zu steigern. Die vorliegende Arbeit wurde mittels einer Kombination aus Software-Entwicklung, Experimenten und Evaluation durchgefĂŒhrt. Die
erzielten Ergebnisse zeigen, dass die Implementierung von Software-Mustern, die ZuverlÀssigkeit und Skalierbarkeit von bioinformatischen Software-Systemen erheblich
verbessern kann. Der Einsatz von Microservices und Containerisierung trug ebenfalls zur Steigerung der ZuverlÀssigkeit, Skalierbarkeit und LeistungsfÀhigkeit des
Systems bei. DarĂŒber hinaus legt die Arbeit dar, dass die Anwendung von SoftwareEngineering-Praktiken, wie modellgesteuertem Design und Container-Orchestrierung,
die effiziente und produktive Bereitstellung und Verwaltung von bioinformatischen
Software-Systemen erleichtern kann. Zudem löst die Implementierung dieses SoftwareSystems, Herausforderungen fĂŒr Forschungsgruppen mit begrenzten Ressourcen. Insgesamt hat das System gezeigt, dass es in der Lage ist, Herausforderungen im Bereich
der Bioinformatik zu bewĂ€ltigen und stellt somit ein wertvolles Werkzeug fĂŒr Forscher in diesem Bereich dar. Die vorliegende Arbeit leistet mehrere wichtige BeitrĂ€ge
zur Beantwortung von Forschungsfragen im Zusammenhang mit dem Entwurf, der
Implementierung und der Optimierung von Software-Systemen fĂŒr die Bioinformatik unter Verwendung von Prinzipien und Praktiken der Softwaretechnik. Unsere
Ergebnisse deuten darauf hin, dass die Einbindung dieser Technologien die ZuverlÀssigkeit, Skalierbarkeit, LeistungsfÀhigkeit, Effizienz und ProduktivitÀt bioinformatischer Software-Systeme erheblich verbessern kann
CU2CL: A CUDA-to-OpenCL Translator for Multi- and Many-core Architectures
The use of graphics processing units (GPUs) in
high-performance parallel computing continues to become more
prevalent, often as part of a heterogeneous system. For years,
CUDA has been the de facto programming environment for
nearly all general-purpose GPU (GPGPU) applications. In spite
of this, the framework is available only on NVIDIA GPUs,
traditionally requiring reimplementation in other frameworks
in order to utilize additional multi- or many-core devices.
On the other hand, OpenCL provides an open and vendorneutral
programming environment and runtime system. With
implementations available for CPUs, GPUs, and other types of
accelerators, OpenCL therefore holds the promise of a âwrite
once, run anywhereâ ecosystem for heterogeneous computing.
Given the many similarities between CUDA and OpenCL,
manually porting a CUDA application to OpenCL is typically
straightforward, albeit tedious and error-prone. In response
to this issue, we created CU2CL, an automated CUDA-to-
OpenCL source-to-source translator that possesses a novel design
and clever reuse of the Clang compiler framework. Currently,
the CU2CL translator covers the primary constructs found in
CUDA runtime API, and we have successfully translated many
applications from the CUDA SDK and Rodinia benchmark suite.
The performance of our automatically translated applications via
CU2CL is on par with their manually ported countparts
An Iteration on the Horizon Simulation Framework to Include .NET and Python Scripting
Modeling and Simulation is a crucial element of the aerospace engineering design pro- cess because it allows designers to thoroughly test their solution before investing in the resources to create it. The Horizon Simulation Framework (HSF) v3.0 is an aerospace modeling and simulation tool that allows the user to verify system level requirements in the early phases of the design process. A low fidelity model of the system that is created by the user is exhaustively tested within the built-in Day-in-the-Life simulator to provide useful information in the form of failed requirements, system bottle necks and leverage points, and potential schedules of operations. The model can be stood up quickly with Extended Markup Language (XML) input files or can be customly created with Python Scripts that interact with the framework at runtime. The goal of the work presented in this thesis is to progress HSF from v2.3 to v3.0 in order to take advantage of current software development technologies. This includes converting the codebase from C++ and Lua scripting to C⯠and Python Scripting. The particulars of the considerations, benefits, and implementation of the new framework are discussed in detail. The simulation data and performance run time of the new framework were compared to that of the old framework. The new framework was found to produce similar data outputs with a faster run time
Getting More out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics.
This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most
widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic
and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in
medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer
mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/
outcome models in the UKâs largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also
explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude
that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process,
and that with the right computational tools and data collection strategies this process can be made defined and repeatable.
The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text
processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and
research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis
systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority
outside of the authorsâ own group) who work in text processing for biomedicine and other areas. GATE is available online
,1. under GNU open source licences and runs on all major operating systems. Support is available from an active user and
developer community and also on a commercial basis
Design a market hub platform for utilities
Dissertação de mestrado em Computer ScienceSoftware Engineering has been contributing, over the years, to a better and more
efficient production of software. Through its methodologies and processes, it has been
used to increase the assurance that the software produced is robust, of quality, easy to
update, and above all that, conforms to the requirements identified by the stakeholders.
With the growth of data sharing, collection and storage in the utilities sector,
there is also an urgent need to maintain and use the available information in a useful
way. This requires the adoption of strategies and infrastructures that can help leveraging
this form of treatment to make it possible to improve the quality of certain utility sectors
(gas, electricity, water, internet, communications). However, much of this process is still
nowadays controlled solely by the production and distribution companies, preventing
the all other users to participate in the process.
To try to undo the supremacy of the production and distribution companies have on
the utilities panorama, and in support of the European Commission vision, the energy
sector is trying to move towards a liberalized market and, with this, it aims to enable all
entities such as consumers, retailers, producers, distributors, to contribute to the
management of the network and for new business models to emerge.
To sustain this ecosystem, where these entities can communicate and share data, it
would be advantageous to have a platform that would allow the communication of such
data between all users.
In this project, and through the application of SE techniques, we will develop, step
by step, a model for a modular, scalable and integrated environment to enable demand
response, data exploration, storage and fulfill all the European Union-wide Data
Protection Regulation (GDPR).A Engenharia de Software tem contribuĂdo, ao longo dos anos, para uma melhor e
eficiente produção de software. AtravĂ©s de metodologias e processos, Ă© possĂvel
produzir software produzido robusto, com qualidade, confiĂĄvel, que possa ser
atualizado e, acima de tudo, que respeite os requisitos identificados pelas partes
interessadas.
Com o crescimento da partilha, coleta e armazenamento de dados no setor de
serviços pĂșblicos, existe uma necessidade urgente de manter e usar as informaçÔes
disponĂveis de maneira Ăștil, com que se consiga extrair conhecimento. Isso requer a
adoção de estratégias e infraestruturas que possam apoiar essa forma de tratamento para
que, a partir da coleta de informaçÔes, seja possĂvel melhorar a qualidade de certos
setores de serviços pĂșblicos (gĂĄs, eletricidade, ĂĄgua, internet, comunicaçÔes). No
entanto, e nos tempos que correm, grande parte desses processos Ă© controlada
exclusivamente pelas empresas de produção e distribuição, impedindo que os demais
utilizadores participem do processo.
De forma desfazer a supremacia que as empresas de produção e distribuição tĂȘm, e
com apoio e motivação da Comissão Europeia, o setor energético tem vindo a tentar
implementar um mercado de energia liberalizado e, com isso, permitir que todas as
entidades como, consumidores, retalhistas, produtores, distribuidores, possam
contribuir para a gestão da rede e na criação de novos modelos de negócio.
Para sustentar este ecossistema, onde todas as entidades podem comunicar e
compartilhar dados, seria Ăștil e vantajoso ter uma plataforma que permitisse a
comunicação de tais dados entre todos os utilizadores.
Neste projeto, e através da aplicação de técnicas de SE, pretenderemos mostrar passo
a passo um método para construir um ambiente modular, escalåvel e integrado para
permitir resposta a procura, exploração dos dados, armazenamento e que cumpra a nova
lei da UniĂŁo Europeia respeitante Ă Proteção de Dados (GDPR).This work is financed by the ERDF â European Regional Development Fund through
the Operational Programme for Competitiveness and Internationalisation - COMPETE
2020 Programme within project «POCI-01-0145-FEDER-006961», and by National
Funds through the Portuguese funding agency, FCT - Fundação para a CiĂȘncia e a
Tecnologia as part of Project «UID/EEA/50014/2013»
3rd EGEE User Forum
We have organized this book in a sequence of chapters, each chapter associated with an application or technical theme introduced by an overview of the contents, and a summary of the main conclusions coming from the Forum for the chapter topic. The first chapter gathers all the plenary session keynote addresses, and following this there is a sequence of chapters covering the application flavoured sessions. These are followed by chapters with the flavour of Computer Science and Grid Technology. The final chapter covers the important number of practical demonstrations and posters exhibited at the Forum. Much of the work presented has a direct link to specific areas of Science, and so we have created a Science Index, presented below. In addition, at the end of this book, we provide a complete list of the institutes and countries involved in the User Forum
- âŠ