325 research outputs found
Architectural Refactoring for Fast and Modular Bioinformatics Sequence Search
Bioinformaticists use the Basic Local Alignment Search Tool (BLAST) to characterize an unknown sequence by
comparing it against a database of known sequences, thus detecting evolutionary relationships and biological properties. mpiBLAST is a widely-used, high-performance, open-source parallelization of BLAST that runs on a computer cluster delivering super-linear speedups. However, the Achilles heel of mpiBLAST is its lack of modularity, adversely affecting maintainability and extensibility; an effective architectural refactoring will benefit both users and developers.
This paper describes our experiences in the architectural refactoring of mpiBLAST into a modular, high-performance software package. Our evaluation of five component-oriented designs culminated in a design that enables modularity while retaining high-performance. Furthermore, we achieved this refactoring effectively and efficiently using eXtreme Programming techniques. These experiences will be of value to software engineers faced with the challenge of creating maintainable and extensible, high-performance, bioinformatics software
ASCR/HEP Exascale Requirements Review Report
This draft report summarizes and details the findings, results, and
recommendations derived from the ASCR/HEP Exascale Requirements Review meeting
held in June, 2015. The main conclusions are as follows. 1) Larger, more
capable computing and data facilities are needed to support HEP science goals
in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of
the demand at the 2025 timescale is at least two orders of magnitude -- and in
some cases greater -- than that available currently. 2) The growth rate of data
produced by simulations is overwhelming the current ability, of both facilities
and researchers, to store and analyze it. Additional resources and new
techniques for data analysis are urgently needed. 3) Data rates and volumes
from HEP experimental facilities are also straining the ability to store and
analyze large and complex data volumes. Appropriately configured
leadership-class facilities can play a transformational role in enabling
scientific discovery from these datasets. 4) A close integration of HPC
simulation and data analysis will aid greatly in interpreting results from HEP
experiments. Such an integration will minimize data movement and facilitate
interdependent workflows. 5) Long-range planning between HEP and ASCR will be
required to meet HEP's research needs. To best use ASCR HPC resources the
experimental HEP program needs a) an established long-term plan for access to
ASCR computational and data resources, b) an ability to map workflows onto HPC
resources, c) the ability for ASCR facilities to accommodate workflows run by
collaborations that can have thousands of individual members, d) to transition
codes to the next-generation HPC platforms that will be available at ASCR
facilities, e) to build up and train a workforce capable of developing and
using simulations and analysis to support HEP scientific research on
next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio
Engineering Enterprise Software Systems with Interactive UML Models and Aspect-Oriented Middleware
Large scale enterprise software systems are inherently complex and hard to maintain. To deal with this complexity, current mainstream software engineering practices aim at raising the level of abstraction to visual models described in OMG’s UML modeling language. Current UML tools, however, produce static design diagrams for documentation which quickly become out-of-sync with the software, and thus obsolete. To address this issue, current model-driven software development approaches aim at software automation using generators that translate models into code. However, these solutions don’t have a good answer for dealing with legacy source code and the evolution of existing enterprise software systems. This research investigates an alternative solution by making the process of modeling more interactive with a simulator and integrating simulation with the live software system. Such an approach supports model-driven development at a higher-level of abstraction with models without sacrificing the need to drop into a lower-level with code. Additionally, simulation also supports better evolution since the impact of a change to a particular area of existing software can be better understood using simulated “what-if” scenarios. This project proposes such a solution by developing a web-based UML simulator for modeling use cases and sequence diagrams and integrating the simulator with existing applications using aspect-oriented middleware technology
Are Delayed Issues Harder to Resolve? Revisiting Cost-to-Fix of Defects throughout the Lifecycle
Many practitioners and academics believe in a delayed issue effect (DIE);
i.e. the longer an issue lingers in the system, the more effort it requires to
resolve. This belief is often used to justify major investments in new
development processes that promise to retire more issues sooner.
This paper tests for the delayed issue effect in 171 software projects
conducted around the world in the period from 2006--2014. To the best of our
knowledge, this is the largest study yet published on this effect. We found no
evidence for the delayed issue effect; i.e. the effort to resolve issues in a
later phase was not consistently or substantially greater than when issues were
resolved soon after their introduction.
This paper documents the above study and explores reasons for this mismatch
between this common rule of thumb and empirical data. In summary, DIE is not
some constant across all projects. Rather, DIE might be an historical relic
that occurs intermittently only in certain kinds of projects. This is a
significant result since it predicts that new development processes that
promise to faster retire more issues will not have a guaranteed return on
investment (depending on the context where applied), and that a long-held truth
in software engineering should not be considered a global truism.Comment: 31 pages. Accepted with minor revisions to Journal of Empirical
Software Engineering. Keywords: software economics, phase delay, cost to fi
Leveraging Software Clones for Software Comprehension: Techniques and Practice
RÉSUMÉ
Le corps de cette thèse est centré sur deux aspects de la détection de clones logiciels: la détection et l’application.
En détection, la contribution principale de cette thèse est un nouveau détecteur de clones conçu avec la librairie mtreelib, elle-même développée expressément pour ce travail. Cette librairie implémente un arbre de métrique général, une structure de donnée spécialisée dans la division des espaces de métriques dans le but d’accélérer certaines requêtes communes, comme les requêtes par intervalles ou les requêtes de plus proche voisin. Cette structure est utilisée pour construire un détecteur de clones qui approxime la distance de Levenshtein avec une forte précision. Une brève évaluation est présentée pour soutenir cette précision. D’autres résultats pertinents sur les métriques et la détection incrémentale de clones sont
également présentés.
Plusieurs applications du nouveau détecteur de clones sont présentés. Tout d’abord, un algorithme original pour la reconstruction d’informations perdus dans les systèmes de versionnement est proposé et testé sur plusieurs grands systèmes. Puis, une évaluation qualitative et quantitative de Firefox est faite sur la base d’une analyse du plus proche voisin; les courbes obtenues sont utilisées pour mettre en lumière les difficultés d’effectuer une transition entre un cycle de développement lent et rapide. Ensuite, deux expériences industrielles d’utilisation
et de déploiement d’une technologie de détection de clonage sont présentés. Ces deux expériences concernent les langages C/C++, Java et TTCN-3. La grande différence de population de clones entre C/C++ et Java et TTCN-3 est présentée. Finalement, un résultat obtenu grâce au croisement d’une analyse de clones et d’une analyse de flux de sécurité met en lumière l’utilité des clones dans l’identification des failles de sécurité.
Le travail se termine par une conclusion et quelques perspectives futures.----------ABSTRACT
This thesis explores two topics in clone analysis: detection and application.
The main contribution in clone detection is a new clone detector based on a library called mtreelib. This library is a package developed for clone detection that implements the metric data structure. This structure is used to build a clone detector that approximates the Levenshtein distance with high accuracy. A small benchmark is produced to assess the accuracy. Other results from these regarding metrics and incremental clone detection are also presented.
Many applications of the clone detector are introduced. An original algorithm to reconstruct missing information in the structure of software repositories is described and tested with data sourced from large existing software. An insight into Firefox is exposed showing the quantity of change between versions and the link between different release cycle types and the number of bugs. Also, an analysis crossing the results from pattern traversal, flow
analysis and clone detection is presented. Two industrial experiments using a different clone detector, CLAN, are also presented with some developers’ perspectives. One of the experiments is done on a language never explored in clone detection, TTCN-3, and the results show that the clone population in that language differs greatly from other well-known languages, like C/C++ and Java.
The thesis concludes with a summary of the findings and some perspectives for future research
Engineering of Streptomyces albus J1074 and Streptomyces lividans TK24 for natural products production
Actinobacteria have remarkable chemical potential that is not explored due to low level of corresponding genes expression. In order to uncover this reservoir of natural products we developed a reporter-guided screening strategy combined with transposon mutagenesis. It was used to activate silent polycyclic tetramate macrolactam biosynthesis gene cluster in Streptomyces albus J1074. As result, the mutant with awaken secondary metabolism was obtained. Analysis of this strain led to identification of new regulatory system consisting of transcriptional regulator XNR_3174 and bacterial hormone-like compound butenolide. XNR_3174 and butenolide biosynthesis genes orthologues are present in the genomes of different Streptomyces. The identified regulatory system comprises a new condition-depended cascade controlling secondary metabolism in Actinobacteria. We also developed new host strains for heterologous production of natural products by deleting 11 endogenous secondary metabolite gene clusters from chromosome of S. lividans and introducing up to 2 sites for integration of foreign DNA. When expressing three heterologous gene clusters the generated hosts have shown better performance than the parental strain. S. lividans TK24 was also improved as a host for heterologous protein production by deleting a set of proteases encoding genes. The developed strains represent a step forward to a better panel of organisms for bioprospecting and genome mining of novel natural products.Actinobakterien besitzen ungeahntes chemisches Potenzial, das aufgrund niedriger Exprimierung entsprechender Gene nicht erforscht werden kann. Um Dieses aufzudecken, wurde eine Kombination aus reportergeführter Screening-Strategie und Transposon Mutagene entwickelt. Die Verwendung dieser Strategie führte zur Aktivierung des polyzyklischen Tetramat-Makrolaktam Gen-Clusters in Streptomyces albus J1074. Der erhaltene Stamm weißt eine aktivierte Produktion von Sekundärmetaboliten auf und Analysen führten zur Identifizierung eines Regulationssystems, bestehend aus dem transkriptionellen Regulator XNR_3174 und dem hormonähnlichen Naturstoff Butenolid. Orthologe von XNR_3174 und der Butenolid Gene findet man in Genomen verschiedener Streptomyceten. Das identifizierte System umfasst die neue „condition-dependent“-Kaskade, die den Sekundärstoffwechsel in Actinobakterien steuert. Zusätzlich entwickelten wir, durch das Entfernen von 11 endogenen Gen-Clustern aus dem Genom von S. lividans und dem Einfügen von zwei DNA Integrationsstellen, neue Wirtsstämme für die heterologe Produktion von Naturstoffen. Bei der heterologen Exprimierung von drei Gen-Clustern zeigten die optimierten Wirte bessere Resultate als der ursprüngliche Stamm. Durch das Entfernen der Proteasegene wurde S. lividans TK24 als Wirt für die Proteinproduktion verbessert. Die entwickelten Stämme vereinfachen die Bioprospektion und die Entdeckung neuer Naturstoffe unter Verwendung des „Genome-Mining“ Ansatzes
Investigation on the Future of Enterprise Architecture in Dynamic Environments
En la economĂa actual, el cambio constante se ha convertido en la nueva normalidad. Las consecuencias de este desarrollo son vĂvidamente visibles. La dinámica en los entornos corporativos está aumentando y las empresas que no se adapten a las condiciones cambiantes serán menos exitosas y finalmente acabarán en cierre.
Mientras el desarrollo y la mejora de las capacidades de adaptación para tener éxito en los entornos dinámicos requieren el trabajo conjunto de muchas partes dentro de la empresa, la Arquitectura Empresarial (Enterprise Architecture - EA) puede suponer una parte vital al habilitar y guiar a distintos elementos organizacionales para ser más efectivos en entornos dinámicos. Sin embargo, para poder hacerlo, la EA necesita transformarse a sà misma. Esta tesis ofrece resultados que describen cómo la EA puede ser efectiva en entornos dinámicos. Los resultados se han estructurado de acuerdo con las siguientes cuatro áreas.
Primero, se presenta una revisiĂłn del estado del arte sobre EA, en el que se describe el desarrollo de la disciplina a lo largo de las Ăşltimas tres dĂ©cadas. Desde el análisis, es evidente que el enfoque de la investigaciĂłn de EA se ha movido desde la comprensiĂłn y la definiciĂłn de la EA hacia gestionar eficazmente la disciplina en entornos empresariales complejos. Las partes posteriores de esta tesis ponen Ă©nfasis en la gestiĂłn efectiva de la EA tambiĂ©n al proporcionar enfoques de EA para circunstancias especĂficas, es decir, entornos con un mayor ritmo de cambio.
En segundo lugar, esta tesis ofrece una descripciĂłn formal de cĂłmo los efectos del ritmo creciente de cambio influyen en la efectividad de la EA. El resultado primario de esta parte es un modelo, basado en la teorĂa de la complejidad, que resume las siguientes dependencias: El ritmo creciente del cambio conduce a una mayor complejidad dinámica para EA ya que existe la necesidad de administrar partes que están cambiando más y más rápido. Esta complejidad debe considerarse desde un punto de vista de negocio y tecnolĂłgico. En el modelo final, La complejidad dinámica de negocios y tecnolĂłgica se consideran como factores contextuales, los cuales influyen en el uso correcto de la EA y, en consecuencia, la efectividad de la disciplina.
Tercero, se presenta una colecciĂłn de enfoques para mejorar la efectividad de la EA en ambientes dinámicos. Estos están estructurados en torno a cuatro dimensiones: la competencia EA, la cual considera quiĂ©n en la organizaciĂłn está trabajando en EA; la metodologĂa EA, que considera cĂłmo se ejecuta EA en la organizaciĂłn; el contenido de EA, que considera la salida de EA; EA Tools que considera con quĂ© EA está siendo creado y mantenido.
Cuarto, la parte final de esta tesis presenta los resultados en forma de arquitectura de referencia para EA en entornos dinámicos. Los enfoques de EA son nuevamente estructurados de acuerdo con las dimensiones descritas anteriormente. La arquitectura de referencia se describe en el nivel de los enfoques individuales, asĂ como en el nivel de dimensiĂłn. En resumen, la competencia EA debe integrarse bien en la empresa. Además de esto, la metodologĂa EA debe estar alineada con prácticas ágiles que permitan decisiones arquitectĂłnicas rápidas. El contenido EA resultante debe ser adaptativo, lo que significa que la arquitectura se puede ajustar fácilmente en caso de que sea necesario. Por Ăşltimo, los arquitectos y otras partes interesadas de EA deberĂan recibir el soporte de las modernas herramientas de EA.
Esta tesis muestra que el objetivo subyacente de EA, en concreto, asegurar la alineaciĂłn de diferentes facetas dentro de la empresa, incluso en las condiciones cambiantes de hoy en dĂa, sigue siendo necesario. Sin embargo, los arquitectos trabajando en entornos dinámicos deberĂan revisar las dimensiones descritas (ÂżquiĂ©n? - ÂżcĂłmo? - ÂżquĂ©? - Âżcon quĂ©?) en su práctica de la EA para seguir siendo efectivos.
Con sus resultados, esta tesis presenta una guĂa para profesionales para que puedan tomar decisiones adecuadas y asĂ optimizar la efectividad de la EA en entornos dinámicos. Al mismo tiempo, esta tesis contribuye al conocimiento acadĂ©mico sobre EA. Los modelos y enfoques presentados abordan la brecha con respecto al enfoque holĂstico actual de la EA en entornos dinámicos. Además, esta tesis señala diversas áreas que brindan oportunidades para futuras investigaciones. Se espera que estas inspirarán a investigadores a impulsar aĂşn más la evoluciĂłn de la EA desde el punto de vista acadĂ©mico.AdministraciĂłn y DirecciĂłn de Empresa
- …