2,699 research outputs found
Finding the Optimal Balance between Over and Under Approximation of Models Inferred from Execution Logs
Models inferred from execution traces (logs) may admit more behaviours than those possible in the real system (over-approximation) or may exclude behaviours that can indeed occur in the real system (under-approximation). Both problems negatively affect model based testing. In fact, over-approximation results in infeasible test cases, i.e., test cases that cannot be activated by any input data. Under-approximation results in missing test cases, i.e., system behaviours that are not represented in the model are also never tested. In this paper we balance over- and under-approximation of inferred models by resorting to multi-objective optimization achieved by means of two search-based algorithms: A multi-objective Genetic Algorithm (GA) and the NSGA-II. We report the results on two open-source web applications and compare the multi-objective optimization to the state-of-the-art KLFA tool. We show that it is possible to identify regions in the Pareto front that contain models which violate fewer application constraints and have a higher bug detection ratio. The Pareto fronts generated by the multi-objective GA contain a region where models violate on average 2% of an application's constraints, compared to 2.8% for NSGA-II and 28.3% for the KLFA models. Similarly, it is possible to identify a region on the Pareto front where the multi-objective GA inferred models have an average bug detection ratio of 110: 3 and the NSGA-II inferred models have an average bug detection ratio of 101: 6. This compares to a bug detection ratio of 310928: 13 for the KLFA tool. © 2012 IEEE
Stochastic Privacy
Online services such as web search and e-commerce applications typically rely
on the collection of data about users, including details of their activities on
the web. Such personal data is used to enhance the quality of service via
personalization of content and to maximize revenues via better targeting of
advertisements and deeper engagement of users on sites. To date, service
providers have largely followed the approach of either requiring or requesting
consent for opting-in to share their data. Users may be willing to share
private information in return for better quality of service or for incentives,
or in return for assurances about the nature and extend of the logging of data.
We introduce \emph{stochastic privacy}, a new approach to privacy centering on
a simple concept: A guarantee is provided to users about the upper-bound on the
probability that their personal data will be used. Such a probability, which we
refer to as \emph{privacy risk}, can be assessed by users as a preference or
communicated as a policy by a service provider. Service providers can work to
personalize and to optimize revenues in accordance with preferences about
privacy risk. We present procedures, proofs, and an overall system for
maximizing the quality of services, while respecting bounds on allowable or
communicated privacy risk. We demonstrate the methodology with a case study and
evaluation of the procedures applied to web search personalization. We show how
we can achieve near-optimal utility of accessing information with provable
guarantees on the probability of sharing data
Exploring the link between test suite quality and automatic specification inference
While no one doubts the importance of correct and complete specifications, many industrial systems
still do not have formal specifications written out — and even when they do, it is hard to check their
correctness and completeness. This work explores the possibility of using an invariant extraction tool
such as Daikon to automatically infer specifications from available test suites with the idea of aiding
software engineers to improve the specifications by having another version to compare to. Given that
our initial experiments did not produce satisfactory results, in this paper we explore which test suite
attributes influence the quality of the inferred specification. Following further study, we found that
instruction, branch and method coverage are correlated to high recall values, reaching up to 97.93%.peer-reviewe
A Machine Learning Enhanced Scheme for Intelligent Network Management
The versatile networking services bring about huge influence on daily living styles while the amount and diversity of services cause high complexity of network systems. The network scale and complexity grow with the increasing infrastructure apparatuses, networking function, networking slices, and underlying architecture evolution. The conventional way is manual administration to maintain the large and complex platform, which makes effective and insightful management troublesome. A feasible and promising scheme is to extract insightful information from largely produced network data. The goal of this thesis is to use learning-based algorithms inspired by machine learning communities to discover valuable knowledge from substantial network data, which directly promotes intelligent management and maintenance. In the thesis, the management and maintenance focus on two schemes: network anomalies detection and root causes localization; critical traffic resource control and optimization. Firstly, the abundant network data wrap up informative messages but its heterogeneity and perplexity make diagnosis challenging. For unstructured logs, abstract and formatted log templates are extracted to regulate log records. An in-depth analysis framework based on heterogeneous data is proposed in order to detect the occurrence of faults and anomalies. It employs representation learning methods to map unstructured data into numerical features, and fuses the extracted feature for network anomaly and fault detection. The representation learning makes use of word2vec-based embedding technologies for semantic expression. Next, the fault and anomaly detection solely unveils the occurrence of events while failing to figure out the root causes for useful administration so that the fault localization opens a gate to narrow down the source of systematic anomalies. The extracted features are formed as the anomaly degree coupled with an importance ranking method to highlight the locations of anomalies in network systems. Two types of ranking modes are instantiated by PageRank and operation errors for jointly highlighting latent issue of locations. Besides the fault and anomaly detection, network traffic engineering deals with network communication and computation resource to optimize data traffic transferring efficiency. Especially when network traffic are constrained with communication conditions, a pro-active path planning scheme is helpful for efficient traffic controlling actions. Then a learning-based traffic planning algorithm is proposed based on sequence-to-sequence model to discover hidden reasonable paths from abundant traffic history data over the Software Defined Network architecture. Finally, traffic engineering merely based on empirical data is likely to result in stale and sub-optimal solutions, even ending up with worse situations. A resilient mechanism is required to adapt network flows based on context into a dynamic environment. Thus, a reinforcement learning-based scheme is put forward for dynamic data forwarding considering network resource status, which explicitly presents a promising performance improvement. In the end, the proposed anomaly processing framework strengthens the analysis and diagnosis for network system administrators through synthesized fault detection and root cause localization. The learning-based traffic engineering stimulates networking flow management via experienced data and further shows a promising direction of flexible traffic adjustment for ever-changing environments
Enhanced feature mining and classifier models to predict customer churn for an e-retailer
Customer Churn, an event indicating a customer
abandoning an established relation with a business is an important
problem researched well both in academic and commercial
interest. Through this work, we propose an improved prediction
model that emphasizes on an effective data collection pipeline
through varied channels capturing explicit and implicit customer
footprints. Our goal is to demonstrate how Feature selection
algorithms can improve classifier efficiency. We also rank prominent
features which play a vital role in customer churn. Our
contributions through this paper can be broadly categorized
into 3 folds: First, we show how popular data mining tools in
Hadoop stack help extract several implicit customer interaction
metrics including Sales and Clickstream logs generated as a result
of customer interaction. Second, through Feature Engineering
techniques we verify that some of the new features we propose
have a definite impact on customer churn. Finally, we establish
how Regularized Logistic Regression, SVM and Gradient Boost
Random Forests are the best performing models for predicting
customer churn verified through comprehensive cross-validation
techniques
N-Gram Based Test Sequence Generation from Finite State Models
Abstract. Model based testing offers a powerful mechanism to test ap-plications that change dynamically and continuously, for which only some limited black-box knowledge is available (this is typically the case of fu-ture internet applications). Models can be inferred from observations of real executions and test cases can be derived from models, according to various strategies (e.g., graph or random visits). The problem is that a relatively large proportion of the test cases obtained in this way might result to be non executable, because they involve infeasible paths. In this paper, we propose a novel test case derivation strategy, based on the computation of the N-gram statistics. Event sequences are gen-erated for which the subsequences of size N respect the distribution of the N-tuples observed in the execution traces. In this way, generated and observed sequences share the same context (up to length N), hence increasing the likelihood for the generated ones of being actually exe-cutable. A consequence of the increased proportion of feasible test cases is that model coverage is also expected to increase.
- …