133 research outputs found

    Combinatorial Interaction Testing for Automated Constraint Repair

    Get PDF
    Highly-configurable software systems can be easily adapted to address user’s needs. Modelling parameter configurations and their relationships can facilitate software reuse. Combinatorial Interaction Testing (CIT) methods are already often used to drive systematic testing of software system configurations. However, a model of the system’s configurations not conforming with respect to its software implementation, must be repaired in order to restore conformance. In this paper we extend CIT by devising a new search-based technique able to repair a model composed of a set of constraints among the various software system’s parameters. Our technique can be used to detect and fix faults both in the model and in the real software system. Experiments for five real-world systems show that our approach can repair on average 37% of conformance faults. Moreover, we also show it can infer parameter constraints in a large real-world software system, hence it can be used for automated creation of CIT models

    Validation of Constraints Among Configuration Parameters Using Search-Based Combinatorial Interaction Testing

    Get PDF
    The appeal of highly-configurable software systems lies in their adaptability to users’ needs. Search-based Combinatorial Interaction Testing (CIT) techniques have been specifically developed to drive the systematic testing of such highly-configurable systems. In order to apply these, it is paramount to devise a model of parameter configurations which conforms to the software implementation. This is a non-trivial task. Therefore, we extend traditional search-based CIT by devising 4 new testing policies able to check if the model correctly identifies constraints among the various software parameters. Our experiments show that one of our new policies is able to detect faults both in the model and the software implementation that are missed by the standard approaches

    Efficient Learning and Evaluation of Complex Concepts in Inductive Logic Programming

    No full text
    Inductive Logic Programming (ILP) is a subfield of Machine Learning with foundations in logic programming. In ILP, logic programming, a subset of first-order logic, is used as a uniform representation language for the problem specification and induced theories. ILP has been successfully applied to many real-world problems, especially in the biological domain (e.g. drug design, protein structure prediction), where relational information is of particular importance. The expressiveness of logic programs grants flexibility in specifying the learning task and understandability to the induced theories. However, this flexibility comes at a high computational cost, constraining the applicability of ILP systems. Constructing and evaluating complex concepts remain two of the main issues that prevent ILP systems from tackling many learning problems. These learning problems are interesting both from a research perspective, as they raise the standards for ILP systems, and from an application perspective, where these target concepts naturally occur in many real-world applications. Such complex concepts cannot be constructed or evaluated by parallelizing existing top-down ILP systems or improving the underlying Prolog engine. Novel search strategies and cover algorithms are needed. The main focus of this thesis is on how to efficiently construct and evaluate complex hypotheses in an ILP setting. In order to construct such hypotheses we investigate two approaches. The first, the Top Directed Hypothesis Derivation framework, implemented in the ILP system TopLog, involves the use of a top theory to constrain the hypothesis space. In the second approach we revisit the bottom-up search strategy of Golem, lifting its restriction on determinate clauses which had rendered Golem inapplicable to many key areas. These developments led to the bottom-up ILP system ProGolem. A challenge that arises with a bottom-up approach is the coverage computation of long, non-determinate, clauses. Prolog’s SLD-resolution is no longer adequate. We developed a new, Prolog-based, theta-subsumption engine which is significantly more efficient than SLD-resolution in computing the coverage of such complex clauses. We provide evidence that ProGolem achieves the goal of learning complex concepts by presenting a protein-hexose binding prediction application. The theory ProGolem induced has a statistically significant better predictive accuracy than that of other learners. More importantly, the biological insights ProGolem’s theory provided were judged by domain experts to be relevant and, in some cases, novel

    What kind of questions do developers ask on Stack Overflow? A comparison of automated approaches to classify posts into question categories

    Get PDF
    On question and answer sites, such as Stack Overflow (SO), developers use tags to label the content of a post and to support developers in question searching and browsing. However, these tags mainly refer to technological aspects instead of the purpose of the question. Tagging questions with their purpose can add a new dimension to the identification of discussed topics in posts on SO. In this paper, we aim at automating the classification of SO question posts into seven question categories. As a first step, we harmonized existing taxonomies of question categories and then, we manually classified 1,000 SO questions according to our new taxonomy. Additionally to the question category, we marked the phrases that indicate a question category for each of the posts. We then use this data set to automate the classification of posts using two approaches. For the first approach, we manually analyzed the phrases to find patterns. Based on regular expressions, we implemented a classifier, for each of the categories, that determines whether a post belongs to a category. These regular expressions are derived by analyzing patterns in the phrases. In the second approach, we use the curated data set to train classification models of supervised machine learning algorithms (Random Forest and Support Vector Machines). For the machine learning algorithms, we experimented with 1,312 different configurations regarding the preprocessing of the text and the representation of the input data. Then, we compared the performance of the regex approach with the performance of the best configuration that uses machine learning algorithms on a validation set of 110 posts. The results show that using the regular expression approach, we can classify posts into the correct question category with an average precision and recall of 0.90, and an MCC of 0.68. Additionally, we applied the regex approach on all questions of SO that deal with Android app development and investigated the co-occurrence of question categories in posts. We found that the categories API usage, Conceptual, and Discrepancy are the most frequently assigned question categories and that they also occur together frequently. Our approach can be used to support developers in browsing SO discussions or researchers in building recommender systems based on SO

    DESIGN OF GENETIC ELEMENTS AND SOFTWARE TOOLS FOR PLANT SYNTHETIC BIOLOGY

    Full text link
    Tesis por compendio[EN] Synthetic Biology is an emerging interdisciplinary field that aims to apply the engineering principles of modularity, abstraction and standardization to genetic engineering. The nascent branch of Synthetic Biology devoted to plants, Plant Synthetic Biology (PSB), offers new breeding possibilities for crops, potentially leading to enhanced resistance, higher yield, or increased nutritional quality. To this end, the molecular tools in the PSB toolbox need to be adapted accordingly, to become modular, standardized and more precise. Thus, the overall objective of this Thesis was to adapt, expand and refine DNA assembly tools for PSB to enable the incorporation of functional specifications to the description of standard genetic elements (phytobricks) and to facilitate the construction of increasingly complex and precise multigenic devices, including genome editing tools. The starting point of this Thesis was the modular DNA assembly method known as GoldenBraid (GB), based on type IIS restriction enzymes. To further optimize the GB construct-making process and to better catalog the phytobricks collection, a database and a set of software-tools were developed as described in Chapter 1. The final webbased software package, released as GB2.0, was made publicly available at www.gbcloning.upv.es. A detailed description of the functioning of GB2.0, exemplified with the building of a multigene construct for anthocyanin overproduction was also provided in Chapter 1. As the number and complexity of GB constructs increased, the next step forward consisted in the refinement of the standards with the incorporation of experimental information associated to each genetic element (described in Chapter 2). To this end, the GB package was reshaped into an improved version (GB3.0), which is a self-contained, fully traceable assembly system where the experimental data describing the functionality of each DNA element is displayed in the form of a standard datasheet. The utility of the technical specifications to anticipate the behavior of composite devices was exemplified with the combination of a chemical switch with a prototype of an anthocyanin overproduction module equivalent to the one described in Chapter 1, resulting in a dexamethasone-responsive anthocyanin device. Furthermore, Chapter 3 describes the adaptation and functional characterization of CRISPR/Cas9 genome engineering tools to the GB technology. The performance of the adapted tools for gene editing, transcriptional activation and repression was successfully validated by transient expression in N. benthamiana. Finally, Chapter 4 presents a practical implementation of GB technology for precision plant breeding. An intragenic construct comprising an intragenic selectable marker and a master regulator of the flavonoid biosynthesis was stably transformed in tomato resulting in fruits enhanced in flavonol content. All together, this Thesis shows the implementation of increasingly complex and precise genetic designs in plants using standard elements and modular tools following the principles of Synthetic Biology.[ES] La Biología Sintética es un campo emergente de carácter interdisciplinar que se fundamenta en la aplicación de los principios ingenieriles de modularidad, abstracción y estandarización a la ingeniería genética. Una nueva vertiente de la Biología Sintética aplicada a las plantas, la Biología Sintética Vegetal (BSV), ofrece nuevas posibilidades de mejora de cultivos que podrían llevar a una mejora de la resistencia, a una mayor productividad, o a un aumento de la calidad nutricional. Sin embargo, para alcanzar este fin las herramientas moleculares disponibles en estos momentos para BSV deben ser adaptadas para convertirse en modulares, estándares y más precisas. Por ello se planteó como objetivo general de esta Tesis adaptar, expandir y refinar las herramientas de ensamblaje de DNA de la BSV para permitir la incorporación de especificaciones funcionales en la descripción de elementos genéticos estándar (fitobricks) y facilitar la construcción de estructuras multigénicas cada vez más complejas y precisas, incluyendo herramientas de editado genético. El punto de partida de esta Tesis fue el método de ensamblaje modular de ADN GoldenBraid (GB) basado en enzimas de restricción tipo IIS. Para optimizar el proceso de ensamblaje y catalogar la colección de fitobricks generados se desarrollaron una base de datos y un conjunto de herramientas software, tal y como se describe en el Capítulo 1. El paquete final de software se presentó en formato web como GB2.0, haciéndolo accesible al público a través de www.gbcloning.upv.es. El Capítulo 1 también proporciona una descripción detallada del funcionamiento de GB2.0 ejemplificando su uso con el ensamblaje de una construcción multigénica para la producción de antocianinas. Con el aumento en número y complejidad de las construcciones GB, el siguiente paso necesario fue el refinamiento de los estándar con la incorporación de la información experimental asociada a cada elemento genético (se describe en el Capítulo 2). Para este fin, el paquete de software de GB se reformuló en una nueva versión (GB3.0), un sistema de ensamblaje auto-contenido y completamente trazable en el que los datos experimentales que describen la funcionalidad de cada elemento genético se muestran en forma de una hoja de datos estándar. La utilidad de las especificaciones técnicas para anticipar el comportamiento de dispositivos biológicos compuestos se ejemplificó con la combinación de un interruptor químico y un prototipo de un módulo de sobreproducción de antocianinas equivalente al descrito en el Capítulo 1, resultando en un dispositivo de producción de antocianinas con respuesta a dexametasona. Además, en el Capítulo 3 se describe la adaptación a la tecnología GB de las herramientas de ingeniería genética CRISPR/Cas9, así como su caracterización funcional. La funcionalidad de estas herramientas para editado génico y activación y represión transcripcional se validó con el sistema de expresión transitoria en N.benthamiana. Finalmente, el Capítulo 4 presenta una implementación práctica del uso de la tecnología GB para hacer mejora vegetal de manera precisa. La transformación estable en tomate de una construcción intragénica que comprendía un marcador de selección intragénico y un regulador de la biosíntesis de flavonoides resultó en frutos con un mayor contenido de flavonoles. En conjunto, esta Tesis muestra la implementación de diseños genéticos cada vez más complejos y precisos en plantas utilizando elementos estándar y herramientas modulares siguiendo los principios de la Biología Sintética.[CA] La Biologia Sintètica és un camp emergent de caràcter interdisciplinar que es fonamenta amb l'aplicació a la enginyeria genètica dels principis de modularitat, abstracció i estandarització. Una nova vessant de la Biologia Sintètica aplicada a les plantes, la Biologia Sintètica Vegetal (BSV), ofereix noves possibilitats de millora de cultius que podrien portar a una millora de la resistència, a una major productivitat, o a un augment de la qualitat nutricional. Tanmateix, per poder arribar a este fi les eines moleculars disponibles en estos moments per a la BSV han d'adaptar-se per convertir-se en modulars, estàndards i més precises. Per això es plantejà com objectiu general d'aquesta Tesi adaptar, expandir i refinar les eines d'ensamblatge d'ADN de la BSV per permetre la incorporació d'especificacions funcionals en la descripció d'elements genètics estàndards (fitobricks) i facilitar la construcció d'estructures multigèniques cada vegada més complexes i precises, incloent eines d'edidat genètic. El punt de partida d'aquesta Tesi fou el mètode d'ensamblatge d'ADN modular GoldenBraid (GB) basat en enzims de restricció tipo IIS. Per optimitzar el proces d'ensamblatge i catalogar la col.lecció de fitobricks generats es desenvolupà una base de dades i un conjunt d'eines software, tal i com es descriu al Capítol 1. El paquet final de software es presentà en format web com GB2.0, fent-se accessible al públic mitjançant la pàgina web www.gbcloning.upv.es. El Capítol 1 també proporciona una descripció detallada del funcionament de GB2.0, exemplificant el seu ús amb l'ensamblatge d'una construcció multigènica per a la producció d'antocians. Amb l'augment en nombre i complexitat de les construccions GB, el següent pas fou el refinament dels estàndards amb la incorporació de la informació experimental associada a cada element genètic (es descriu en el Capítol 2). Per a aquest fi, el paquet de software de GB es reformulà amb una nova versió anomenada GB3.0. Aquesta versió consisteix en un sistema d'ensamblatge auto-contingut i complemtament traçable on les dades experimentals que descriuen la funcionalitat de cada element genètic es mostren en forma de fulla de dades estàndard. La utilitat de les especificacions tècniques per anticipar el comportament de dispositius biològics compostos s'exemplificà amb la combinació de un interruptor químic i un prototip d'un mòdul de sobreproducció d'antocians equivalent al descrit al Capítol 1. Aquesta combinació va tindre com a resultat un dispositiu de producció d'antocians que respón a dexametasona. A més a més, al Capítol 3 es descriu l'adaptació a la tecnologia GB de les eines d'enginyeria genètica CRISPR/Cas9, així com la seua caracterització funcional. La funcionalitat d'aquestes eines per a l'editat gènic i activació i repressió transcripcional es validà amb el sistema d'expressió transitòria en N. benthamiana. Finalment, al Capítol 4 es presenta una implementació pràctica de l'ús de la tecnologia GB per fer millora vegetal de mode precís. La transformació estable en tomaca d'una construcció intragènica que comprén un marcador de selecció intragènic i un regulador de la biosíntesi de flavonoïdes resultà en plantes de tomaca amb un major contingut de flavonols en llur fruits. En conjunt, esta Tesi mostra la implementació de dissenys genètics cada vegada més complexos i precisos en plantes utilitzant elements estàndards i eines modulars seguint els principis de la Biologia Sintètica.Vázquez Vilar, M. (2016). DESIGN OF GENETIC ELEMENTS AND SOFTWARE TOOLS FOR PLANT SYNTHETIC BIOLOGY [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/68483TESISPremios Extraordinarios de tesis doctoralesCompendi

    Web-sovelluskehityksen tehostaminen Scala-ohjelmointikielellä

    Get PDF
    Web-sovellusten kehittämiseen on tarjolla useita vaihtoehtoisia teknologioita. Loppukäyttäjälle valittu toteutusteknologia ei välttämättä ole näkyvissä, mutta asiakkaalle se näkyy projektista aiheutuvina kustannuksina. Lisäksi teknologiavalinnoilla on vaikutusta lopputuloksen laadulle ja projektin kehittäjien työn mielekkyydelle. Pääsääntöisesti vaihtoehdot perustuvat dynaamisiin ohjelmointikieliin, ja staattisesti tyypitetty kieli toisikin mielenkiintoisen vaihtoehdon. Tässä työssä tarkastellaan Scala-ohjelmointikielen soveltuvuutta perinteiseen Web-kehittämiseen ja arvioidaan sen tuomia etuja ja haittoja. Scala-ohjelmointikieli on tavukoodiyhteensopiva yleisesti käytetyn Java-kielen kanssa, mutta sen funktionaalisen paradigman hyödyntäminen ja mahdollisesti hyvinkin erilainen syntaksi tekevät siitä hankalahkon opittavan Java-ohjelmoijalle. Tämän vuoksi työssä esitellään Scala-ohjelmointikielen perusteet sekä työn kannalta oleellisimmat tekniikat. Työ pyrkii tuomaan esiin eroja ja yhtäläisyyksiä Java-kieleen nähden ja luo silmäyksen tämänhetkiseen työkalutukeen. Työssä toteutetaan Scala-kielellä perinteisiä Web-kehityksessä vastaantulevia asioita ja arvioidaan niiden etuja ja haittoja vaihtoehtoisiin tapoihin nähden. Ohjelmakoodiesimerkit tarjoavat samalla konkreettisen näkymän asiaan. Joitakin mielenkiintoisia mahdollisuuksia havaittiin, joilla voi olla merkittäväkin positiivinen vaikutus ohjelmistokehitykseen. Työn yhteydessä toteutettu yksinkertainen esimerkkiprojekti tarkastelee työssä tehtyjä havaintoja käytännössä. Tällä pyritään tuomaan esille asioita, jotka tulisivat vastaan tosielämän projekteissakin. Projekti toteutettiin sekä Javalla että Scalalla, mikä osoittaa että Scala-kieltä voidaan käyttää saumattomasti osassa projektia ilman että se liikaa häiritsee muita osia. Esimerkkiprojektille ajettu yksinkertainen suorituskykytesti osoittaa, ettei Scala-toteutuksella ole suorituskykyongelmia Javatoteutukseen nähden. /Kir10There exists a number of alternative technologies for Web application development. The final product as seen by the end user might not reflect the underlying technologies, but for the customer it is visible in project costs. The chosen technologies have an impact on the overall quality of the product and they also play an important role in making the development pleasant. Most of the available technologies are based on dynamic programming languages and thus a statically typed language would introduce an interesting alternative. This thesis examines the suitability of the Scala programming language for Web development and evaluates its advantages and disadvantages. Scala programming language is bytecode compatible with the well-known Java language but its utilization of the functional paradigm and possibly quite differing syntax make it somewhat hard to learn for an average Java programmer. Therefore the basics and the most relevant techniques of the Scala language are introduced. The similarities as well as differences to Java language are brought up, and the available tool support for Scala is examined and briefly compared to the excellence of Java tools. This thesis implements some common functionality related to web development in Scala. The discovered advantages and disadvantages are evaluated against alternative methods. Program code listings offer a concrete view to the subject. Some interesting possibilities were discovered that could have a significant positive impact on development. A simple example project investigates the observations in practice. This strives to bring out issues that could also be present in real-life projects. The project was implemented in both Java and Scala which shows that Scala can be used in some parts of a project without major interference to other parts. A simple performance test run against the example project shows that Scala does not suffer from performance issues comparing to Java

    Recognition of short functional motifs in protein sequences

    Get PDF
    The main goal of this study was to develop a method for computational de novo prediction of short linear motifs (SLiMs) in protein sequences that would provide advantages over existing solutions for the users. The users are typically biological laboratory researchers, who want to elucidate the function of a protein that is possibly mediated by a short motif. Such a process can be subcellular localization, secretion, post-translational modification or degradation of proteins. Conducting such studies only with experimental techniques is often associated with high costs and risks of uncertainty. Preliminary prediction of putative motifs with computational methods, them being fast and much less expensive, provides possibilities for generating hypotheses and therefore, more directed and efficient planning of experiments. To meet this goal, I have developed HH-MOTiF – a web-based tool for de novo discovery of SLiMs in a set of protein sequences. While working on the project, I have also detected patterns in sequence properties of certain SLiMs that make their de novo prediction easier. As some of these patterns are not yet described in the literature, I am sharing them in this thesis. While evaluating and comparing motif prediction results, I have identified conceptual gaps in theoretical studies, as well as existing practical solutions for comparing two sets of positional data annotating the same set of biological sequences. To close this gap and to be able to carry out in-depth performance analyses of HH-MOTiF in comparison to other predictors, I have developed a corresponding statistical method, SLALOM (for StatisticaL Analysis of Locus Overlap Method). It is currently available as a standalone command line tool
    corecore