33,677 research outputs found

    Edge-weighting of gene expression graphs

    Get PDF
    In recent years, considerable research efforts have been directed to micro-array technologies and their role in providing simultaneous information on expression profiles for thousands of genes. These data, when subjected to clustering and classification procedures, can assist in identifying patterns and providing insight on biological processes. To understand the properties of complex gene expression datasets, graphical representations can be used. Intuitively, the data can be represented in terms of a bipartite graph, with weighted edges corresponding to gene-sample node couples in the dataset. Biologically meaningful subgraphs can be sought, but performance can be influenced both by the search algorithm, and, by the graph-weighting scheme and both merit rigorous investigation. In this paper, we focus on edge-weighting schemes for bipartite graphical representation of gene expression. Two novel methods are presented: the first is based on empirical evidence; the second on a geometric distribution. The schemes are compared for several real datasets, assessing efficiency of performance based on four essential properties: robustness to noise and missing values, discrimination, parameter influence on scheme efficiency and reusability. Recommendations and limitations are briefly discussed

    What Am I Testing and Where? Comparing Testing Procedures based on Lightweight Requirements Annotations

    Get PDF
    [Context] The testing of software-intensive systems is performed in different test stages each having a large number of test cases. These test cases are commonly derived from requirements. Each test stages exhibits specific demands and constraints with respect to their degree of detail and what can be tested. Therefore, specific test suites are defined for each test stage. In this paper, the focus is on the domain of embedded systems, where, among others, typical test stages are Software- and Hardware-in-the-loop. [Objective] Monitoring and controlling which requirements are verified in which detail and in which test stage is a challenge for engineers. However, this information is necessary to assure a certain test coverage, to minimize redundant testing procedures, and to avoid inconsistencies between test stages. In addition, engineers are reluctant to state their requirements in terms of structured languages or models that would facilitate the relation of requirements to test executions. [Method] With our approach, we close the gap between requirements specifications and test executions. Previously, we have proposed a lightweight markup language for requirements which provides a set of annotations that can be applied to natural language requirements. The annotations are mapped to events and signals in test executions. As a result, meaningful insights from a set of test executions can be directly related to artifacts in the requirements specification. In this paper, we use the markup language to compare different test stages with one another. [Results] We annotate 443 natural language requirements of a driver assistance system with the means of our lightweight markup language. The annotations are then linked to 1300 test executions from a simulation environment and 53 test executions from test drives with human drivers. Based on the annotations, we are able to analyze how similar the test stages are and how well test stages and test cases are aligned with the requirements. Further, we highlight the general applicability of our approach through this extensive experimental evaluation. [Conclusion] With our approach, the results of several test levels are linked to the requirements and enable the evaluation of complex test executions. By this means, practitioners can easily evaluate how well a systems performs with regards to its specification and, additionally, can reason about the expressiveness of the applied test stage.TU Berlin, Open-Access-Mittel - 202

    Test Case Selection and Prioritization Using Machine Learning: A Systematic Literature Review

    Get PDF
    Regression testing is an essential activity to assure that software code changes do not adversely affect existing functionalities. With the wide adoption of Continuous Integration (CI) in software projects, which increases the frequency of running software builds, running all tests can be time-consuming and resource-intensive. To alleviate that problem, Test case Selection and Prioritization (TSP) techniques have been proposed to improve regression testing by selecting and prioritizing test cases in order to provide early feedback to developers. In recent years, researchers have relied on Machine Learning (ML) techniques to achieve effective TSP (ML-based TSP). Such techniques help combine information about test cases, from partial and imperfect sources, into accurate prediction models. This work conducts a systematic literature review focused on ML-based TSP techniques, aiming to perform an in-depth analysis of the state of the art, thus gaining insights regarding future avenues of research. To that end, we analyze 29 primary studies published from 2006 to 2020, which have been identified through a systematic and documented process. This paper addresses five research questions addressing variations in ML-based TSP techniques and feature sets for training and testing ML models, alternative metrics used for evaluating the techniques, the performance of techniques, and the reproducibility of the published studies

    Efficient Network Domination for Life Science Applications

    Get PDF
    With the ever-increasing size of data available to researchers, traditional methods of analysis often cannot scale to match problems being studied. Often only a subset of variables may be utilized or studied further, motivating the need of techniques that can prioritize variable selection. This dissertation describes the development and application of graph theoretic techniques, particularly the notion of domination, for this purpose. In the first part of this dissertation, algorithms for vertex prioritization in the field of network controllability are studied. Here, the number of solutions to which a vertex belongs is used to classify said vertex and determine its suitability in controlling a network. Novel efficient scalable algorithms are developed and analyzed. Empirical tests demonstrate the improvement of these algorithms over those already established in the literature. The second part of this dissertation concerns the prioritization of genes for loss-of-function allele studies in mice. The International Mouse Phenotyping Consortium leads the initiative to develop a loss-of-function allele for each protein coding gene in the mouse genome. Only a small proportion of untested genes can be selected for further study. To address the need to prioritize genes, a generalizable data science strategy is developed. This strategy models genes as a gene-similarity graph, and from it selects subset that will be further characterized. Empirical tests demonstrate the method’s utility over that of pseudorandom selection and less computationally demanding methods. Finally, part three addresses the important task of preprocessing in the context of noisy public health data. Many public health databases have been developed to collect, curate, and store a variety of environmental measurements. Idiosyncrasies in these measurements, however, introduce noise to data found in these databases in several ways including missing, incorrect, outlying, and incompatible data. Beyond noisy data, multiple measurements of similar variables can introduce problems of multicollinearity. Domination is again employed in a novel graph method to handle autocorrelation. Empirical results using the Public Health Exposome dataset are reported. Together these three parts demonstrate the utility of subset selection via domination when applied to a multitude of data sources from a variety of disciplines in the life sciences
    corecore