80,711 research outputs found
Using Bad Learners to find Good Configurations
Finding the optimally performing configuration of a software system for a
given setting is often challenging. Recent approaches address this challenge by
learning performance models based on a sample set of configurations. However,
building an accurate performance model can be very expensive (and is often
infeasible in practice). The central insight of this paper is that exact
performance values (e.g. the response time of a software system) are not
required to rank configurations and to identify the optimal one. As shown by
our experiments, models that are cheap to learn but inaccurate (with respect to
the difference between actual and predicted performance) can still be used rank
configurations and hence find the optimal configuration. This novel
\emph{rank-based approach} allows us to significantly reduce the cost (in terms
of number of measurements of sample configuration) as well as the time required
to build models. We evaluate our approach with 21 scenarios based on 9 software
systems and demonstrate that our approach is beneficial in 16 scenarios; for
the remaining 5 scenarios, an accurate model can be built by using very few
samples anyway, without the need for a rank-based approach.Comment: 11 pages, 11 figure
A Data-driven Resilience Framework of Directionality Configuration based on Topological Credentials in Road Networks
Roadway reconfiguration is a crucial aspect of transportation planning,
aiming to enhance traffic flow, reduce congestion, and improve overall road
network performance with existing infrastructure and resources. This paper
presents a novel roadway reconfiguration technique by integrating optimization
based Brute Force search approach and decision support framework to rank
various roadway configurations for better performance. The proposed framework
incorporates a multi-criteria decision analysis (MCDA) approach, combining
input from generated scenarios during the optimization process. By utilizing
data from optimization, the model identifies total betweenness centrality
(TBC), system travel time (STT), and total link traffic flow (TLTF) as the most
influential decision variables. The developed framework leverages graph theory
to model the transportation network topology and apply network science metrics
as well as stochastic user equilibrium traffic assignment to assess the impact
of each roadway configuration on the overall network performance. To rank the
roadway configurations, the framework employs machine learning algorithms, such
as ridge regression, to determine the optimal weights for each criterion (i.e.,
TBC, STT, TLTF). Moreover, the network-based analysis ensures that the selected
configurations not only optimize individual roadway segments but also enhance
system-level efficiency, which is particularly helpful as the increasing
frequency and intensity of natural disasters and other disruptive events
underscore the critical need for resilient transportation networks. By
integrating multi-criteria decision analysis, machine learning, and network
science metrics, the proposed framework would enable transportation planners to
make informed and data-driven decisions, leading to more sustainable,
efficient, and resilient roadway configurations.Comment: 103rd Transportation Research Board (TRB) Annual Meetin
CM-CASL: Comparison-based Performance Modeling of Software Systems via Collaborative Active and Semisupervised Learning
Configuration tuning for large software systems is generally challenging due
to the complex configuration space and expensive performance evaluation. Most
existing approaches follow a two-phase process, first learning a
regression-based performance prediction model on available samples and then
searching for the configurations with satisfactory performance using the
learned model. Such regression-based models often suffer from the scarcity of
samples due to the enormous time and resources required to run a large software
system with a specific configuration. Moreover, previous studies have shown
that even a highly accurate regression-based model may fail to discern the
relative merit between two configurations, whereas performance comparison is
actually one fundamental strategy for configuration tuning. To address these
issues, this paper proposes CM-CASL, a Comparison-based performance Modeling
approach for software systems via Collaborative Active and Semisupervised
Learning. CM-CASL learns a classification model that compares the performance
of two given configurations, and enhances the samples through a collaborative
labeling process by both human experts and classifiers using an integration of
active and semisupervised learning. Experimental results demonstrate that
CM-CASL outperforms two state-of-the-art performance modeling approaches in
terms of both classification accuracy and rank accuracy, and thus provides a
better performance model for the subsequent work of configuration tuning
Selective Query Processing: a Risk-Sensitive Selection of System Configurations
In information retrieval systems, search parameters are optimized to ensure
high effectiveness based on a set of past searches and these optimized
parameters are then used as the system configuration for all subsequent
queries. A better approach, however, would be to adapt the parameters to fit
the query at hand. Selective query expansion is one such an approach, in which
the system decides automatically whether or not to expand the query, resulting
in two possible system configurations. This approach was extended recently to
include many other parameters, leading to many possible system configurations
where the system automatically selects the best configuration on a per-query
basis. To determine the ideal configurations to use on a per-query basis in
real-world systems we developed a method in which a restricted number of
possible configurations is pre-selected and then used in a meta-search engine
that decides the best search configuration on a per query basis. We define a
risk-sensitive approach for configuration pre-selection that considers the
risk-reward trade-off between the number of configurations kept, and system
effectiveness. For final configuration selection, the decision is based on
query feature similarities. We find that a relatively small number of
configurations (20) selected by our risk-sensitive model is sufficient to
increase effectiveness by about 15% according(P@10, nDCG@10) when compared to
traditional grid search using a single configuration and by about 20% when
compared to learning to rank documents. Our risk-sensitive approach works for
both diversity- and ad hoc-oriented searches. Moreover, the similarity-based
selection method outperforms the more sophisticated approaches. Thus, we
demonstrate the feasibility of developing per-query information retrieval
systems, which will guide future research in this direction.Comment: 30 pages, 5 figures, 8 tables; submitted to TOIS ACM journa
Experimental Performance Evaluation of Cloud-Based Analytics-as-a-Service
An increasing number of Analytics-as-a-Service solutions has recently seen
the light, in the landscape of cloud-based services. These services allow
flexible composition of compute and storage components, that create powerful
data ingestion and processing pipelines. This work is a first attempt at an
experimental evaluation of analytic application performance executed using a
wide range of storage service configurations. We present an intuitive notion of
data locality, that we use as a proxy to rank different service compositions in
terms of expected performance. Through an empirical analysis, we dissect the
performance achieved by analytic workloads and unveil problems due to the
impedance mismatch that arise in some configurations. Our work paves the way to
a better understanding of modern cloud-based analytic services and their
performance, both for its end-users and their providers.Comment: Longer version of the paper in Submission at IEEE CLOUD'1
- …