8,770 research outputs found

    Smoke Testing for Machine Learning: Simple Tests to Discover Severe Defects

    Get PDF
    Machine learning is nowadays a standard technique for data analysis within software applications. Software engineers need quality assurance techniques that are suitable for these new kinds of systems. Within this article, we discuss the question whether standard software testing techniques that have been part of textbooks since decades are also useful for the testing of machine learning software. Concretely, we try to determine generic and simple smoke tests that can be used to assert that basic functions can be executed without crashing. We found that we can derive such tests using techniques similar to equivalence classes and boundary value analysis. Moreover, we found that these concepts can also be applied to hyperparameters, to further improve the quality of the smoke tests. Even though our approach is almost trivial, we were able to find bugs in all three machine learning libraries that we tested and severe bugs in two of the three libraries. This demonstrates that common software testing techniques are still valid in the age of machine learning and that considerations how they can be adapted to this new context can help to find and prevent severe bugs, even in mature machine learning libraries.Comment: Accepted at Empirical Software Engineering, Springe

    ALOJA: A framework for benchmarking and predictive analytics in Hadoop deployments

    Get PDF
    This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and Microsoft to automate the characterization of cost-effectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex run-time environment, where costs and performance depend on a large number of configuration choices. The ALOJA project has created an open, vendor-neutral repository, featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a test-bed and tools to deploy and evaluate the cost-effectiveness of different hardware configurations, parameters and Cloud services. Despite early success within ALOJA, a comprehensive study requires automation of modeling procedures to allow an analysis of large and resource-constrained search spaces. The predictive analytics extension, ALOJA-ML, provides an automated system allowing knowledge discovery by modeling environments from observed executions. The resulting models can forecast execution behaviors, predicting execution times for new configurations and hardware choices. That also enables model-based anomaly detection or efficient benchmark guidance by prioritizing executions. In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). This work is partially supported by the Ministry of Economy of Spain under contracts TIN2012-34557 and 2014SGR1051.Peer ReviewedPostprint (published version

    Systems approaches and algorithms for discovery of combinatorial therapies

    Full text link
    Effective therapy of complex diseases requires control of highly non-linear complex networks that remain incompletely characterized. In particular, drug intervention can be seen as control of signaling in cellular networks. Identification of control parameters presents an extreme challenge due to the combinatorial explosion of control possibilities in combination therapy and to the incomplete knowledge of the systems biology of cells. In this review paper we describe the main current and proposed approaches to the design of combinatorial therapies, including the empirical methods used now by clinicians and alternative approaches suggested recently by several authors. New approaches for designing combinations arising from systems biology are described. We discuss in special detail the design of algorithms that identify optimal control parameters in cellular networks based on a quantitative characterization of control landscapes, maximizing utilization of incomplete knowledge of the state and structure of intracellular networks. The use of new technology for high-throughput measurements is key to these new approaches to combination therapy and essential for the characterization of control landscapes and implementation of the algorithms. Combinatorial optimization in medical therapy is also compared with the combinatorial optimization of engineering and materials science and similarities and differences are delineated.Comment: 25 page

    Testing Real-World Healthcare IoT Application: Experiences and Lessons Learned

    Full text link
    Healthcare Internet of Things (IoT) applications require rigorous testing to ensure their dependability. Such applications are typically integrated with various third-party healthcare applications and medical devices through REST APIs. This integrated network of healthcare IoT applications leads to REST APIs with complicated and interdependent structures, thus creating a major challenge for automated system-level testing. We report an industrial evaluation of a state-of-the-art REST APIs testing approach (RESTest) on a real-world healthcare IoT application. We analyze the effectiveness of RESTest's testing strategies regarding REST APIs failures, faults in the application, and REST API coverage, by experimenting with six REST APIs of 41 API endpoints of the healthcare IoT application. Results show that several failures are discovered in different REST APIs with ~56% coverage using RESTest. Moreover, nine potential faults are identified. Using the evidence collected from the experiments, we provide our experiences and lessons learned.Comment: To appear in the Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023

    A reusable iterative optimization software library to solve combinatorial problems with approximate reasoning

    Get PDF
    Real world combinatorial optimization problems such as scheduling are typically too complex to solve with exact methods. Additionally, the problems often have to observe vaguely specified constraints of different importance, the available data may be uncertain, and compromises between antagonistic criteria may be necessary. We present a combination of approximate reasoning based constraints and iterative optimization based heuristics that help to model and solve such problems in a framework of C++ software libraries called StarFLIP++. While initially developed to schedule continuous caster units in steel plants, we present in this paper results from reusing the library components in a shift scheduling system for the workforce of an industrial production plant.Comment: 33 pages, 9 figures; for a project overview see http://www.dbai.tuwien.ac.at/proj/StarFLIP

    Dagstuhl Reports : Volume 1, Issue 2, February 2011

    Get PDF
    Online Privacy: Towards Informational Self-Determination on the Internet (Dagstuhl Perspectives Workshop 11061) : Simone Fischer-Hübner, Chris Hoofnagle, Kai Rannenberg, Michael Waidner, Ioannis Krontiris and Michael Marhöfer Self-Repairing Programs (Dagstuhl Seminar 11062) : Mauro Pezzé, Martin C. Rinard, Westley Weimer and Andreas Zeller Theory and Applications of Graph Searching Problems (Dagstuhl Seminar 11071) : Fedor V. Fomin, Pierre Fraigniaud, Stephan Kreutzer and Dimitrios M. Thilikos Combinatorial and Algorithmic Aspects of Sequence Processing (Dagstuhl Seminar 11081) : Maxime Crochemore, Lila Kari, Mehryar Mohri and Dirk Nowotka Packing and Scheduling Algorithms for Information and Communication Services (Dagstuhl Seminar 11091) Klaus Jansen, Claire Mathieu, Hadas Shachnai and Neal E. Youn

    Ancient and historical systems

    Get PDF
    • …
    corecore