8,770 research outputs found
Smoke Testing for Machine Learning: Simple Tests to Discover Severe Defects
Machine learning is nowadays a standard technique for data analysis within
software applications. Software engineers need quality assurance techniques
that are suitable for these new kinds of systems. Within this article, we
discuss the question whether standard software testing techniques that have
been part of textbooks since decades are also useful for the testing of machine
learning software. Concretely, we try to determine generic and simple smoke
tests that can be used to assert that basic functions can be executed without
crashing. We found that we can derive such tests using techniques similar to
equivalence classes and boundary value analysis. Moreover, we found that these
concepts can also be applied to hyperparameters, to further improve the quality
of the smoke tests. Even though our approach is almost trivial, we were able to
find bugs in all three machine learning libraries that we tested and severe
bugs in two of the three libraries. This demonstrates that common software
testing techniques are still valid in the age of machine learning and that
considerations how they can be adapted to this new context can help to find and
prevent severe bugs, even in mature machine learning libraries.Comment: Accepted at Empirical Software Engineering, Springe
ALOJA: A framework for benchmarking and predictive analytics in Hadoop deployments
This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and Microsoft to automate the characterization of cost-effectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex run-time environment, where costs and performance depend on a large number of configuration choices. The ALOJA project has created an open, vendor-neutral repository, featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a test-bed and tools to deploy and evaluate the cost-effectiveness of different hardware configurations, parameters and Cloud services. Despite early success within ALOJA, a comprehensive study requires automation of modeling procedures to allow an analysis of large and resource-constrained search spaces. The predictive analytics extension, ALOJA-ML, provides an automated system allowing knowledge discovery by modeling environments from observed executions. The resulting models can forecast execution behaviors, predicting execution times for new configurations and hardware choices. That also enables model-based anomaly detection or efficient benchmark guidance by prioritizing executions. In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement
No 639595). This work is partially supported by the Ministry of Economy of Spain under contracts TIN2012-34557 and 2014SGR1051.Peer ReviewedPostprint (published version
Systems approaches and algorithms for discovery of combinatorial therapies
Effective therapy of complex diseases requires control of highly non-linear
complex networks that remain incompletely characterized. In particular, drug
intervention can be seen as control of signaling in cellular networks.
Identification of control parameters presents an extreme challenge due to the
combinatorial explosion of control possibilities in combination therapy and to
the incomplete knowledge of the systems biology of cells. In this review paper
we describe the main current and proposed approaches to the design of
combinatorial therapies, including the empirical methods used now by clinicians
and alternative approaches suggested recently by several authors. New
approaches for designing combinations arising from systems biology are
described. We discuss in special detail the design of algorithms that identify
optimal control parameters in cellular networks based on a quantitative
characterization of control landscapes, maximizing utilization of incomplete
knowledge of the state and structure of intracellular networks. The use of new
technology for high-throughput measurements is key to these new approaches to
combination therapy and essential for the characterization of control
landscapes and implementation of the algorithms. Combinatorial optimization in
medical therapy is also compared with the combinatorial optimization of
engineering and materials science and similarities and differences are
delineated.Comment: 25 page
Testing Real-World Healthcare IoT Application: Experiences and Lessons Learned
Healthcare Internet of Things (IoT) applications require rigorous testing to
ensure their dependability. Such applications are typically integrated with
various third-party healthcare applications and medical devices through REST
APIs. This integrated network of healthcare IoT applications leads to REST APIs
with complicated and interdependent structures, thus creating a major challenge
for automated system-level testing. We report an industrial evaluation of a
state-of-the-art REST APIs testing approach (RESTest) on a real-world
healthcare IoT application. We analyze the effectiveness of RESTest's testing
strategies regarding REST APIs failures, faults in the application, and REST
API coverage, by experimenting with six REST APIs of 41 API endpoints of the
healthcare IoT application. Results show that several failures are discovered
in different REST APIs with ~56% coverage using RESTest. Moreover, nine
potential faults are identified. Using the evidence collected from the
experiments, we provide our experiences and lessons learned.Comment: To appear in the Proceedings of the 31st ACM Joint European Software
Engineering Conference and Symposium on the Foundations of Software
Engineering (ESEC/FSE 2023
A reusable iterative optimization software library to solve combinatorial problems with approximate reasoning
Real world combinatorial optimization problems such as scheduling are
typically too complex to solve with exact methods. Additionally, the problems
often have to observe vaguely specified constraints of different importance,
the available data may be uncertain, and compromises between antagonistic
criteria may be necessary. We present a combination of approximate reasoning
based constraints and iterative optimization based heuristics that help to
model and solve such problems in a framework of C++ software libraries called
StarFLIP++. While initially developed to schedule continuous caster units in
steel plants, we present in this paper results from reusing the library
components in a shift scheduling system for the workforce of an industrial
production plant.Comment: 33 pages, 9 figures; for a project overview see
http://www.dbai.tuwien.ac.at/proj/StarFLIP
Dagstuhl Reports : Volume 1, Issue 2, February 2011
Online Privacy: Towards Informational Self-Determination on the Internet (Dagstuhl Perspectives Workshop 11061) : Simone Fischer-Hübner, Chris Hoofnagle, Kai Rannenberg, Michael Waidner, Ioannis Krontiris and Michael Marhöfer Self-Repairing Programs (Dagstuhl Seminar 11062) : Mauro Pezzé, Martin C. Rinard, Westley Weimer and Andreas Zeller Theory and Applications of Graph Searching Problems (Dagstuhl Seminar 11071) : Fedor V. Fomin, Pierre Fraigniaud, Stephan Kreutzer and Dimitrios M. Thilikos Combinatorial and Algorithmic Aspects of Sequence Processing (Dagstuhl Seminar 11081) : Maxime Crochemore, Lila Kari, Mehryar Mohri and Dirk Nowotka Packing and Scheduling Algorithms for Information and Communication Services (Dagstuhl Seminar 11091) Klaus Jansen, Claire Mathieu, Hadas Shachnai and Neal E. Youn
- …