2,699 research outputs found

    Finding the Optimal Balance between Over and Under Approximation of Models Inferred from Execution Logs

    Full text link
    Models inferred from execution traces (logs) may admit more behaviours than those possible in the real system (over-approximation) or may exclude behaviours that can indeed occur in the real system (under-approximation). Both problems negatively affect model based testing. In fact, over-approximation results in infeasible test cases, i.e., test cases that cannot be activated by any input data. Under-approximation results in missing test cases, i.e., system behaviours that are not represented in the model are also never tested. In this paper we balance over- and under-approximation of inferred models by resorting to multi-objective optimization achieved by means of two search-based algorithms: A multi-objective Genetic Algorithm (GA) and the NSGA-II. We report the results on two open-source web applications and compare the multi-objective optimization to the state-of-the-art KLFA tool. We show that it is possible to identify regions in the Pareto front that contain models which violate fewer application constraints and have a higher bug detection ratio. The Pareto fronts generated by the multi-objective GA contain a region where models violate on average 2% of an application's constraints, compared to 2.8% for NSGA-II and 28.3% for the KLFA models. Similarly, it is possible to identify a region on the Pareto front where the multi-objective GA inferred models have an average bug detection ratio of 110: 3 and the NSGA-II inferred models have an average bug detection ratio of 101: 6. This compares to a bug detection ratio of 310928: 13 for the KLFA tool. © 2012 IEEE

    Stochastic Privacy

    Full text link
    Online services such as web search and e-commerce applications typically rely on the collection of data about users, including details of their activities on the web. Such personal data is used to enhance the quality of service via personalization of content and to maximize revenues via better targeting of advertisements and deeper engagement of users on sites. To date, service providers have largely followed the approach of either requiring or requesting consent for opting-in to share their data. Users may be willing to share private information in return for better quality of service or for incentives, or in return for assurances about the nature and extend of the logging of data. We introduce \emph{stochastic privacy}, a new approach to privacy centering on a simple concept: A guarantee is provided to users about the upper-bound on the probability that their personal data will be used. Such a probability, which we refer to as \emph{privacy risk}, can be assessed by users as a preference or communicated as a policy by a service provider. Service providers can work to personalize and to optimize revenues in accordance with preferences about privacy risk. We present procedures, proofs, and an overall system for maximizing the quality of services, while respecting bounds on allowable or communicated privacy risk. We demonstrate the methodology with a case study and evaluation of the procedures applied to web search personalization. We show how we can achieve near-optimal utility of accessing information with provable guarantees on the probability of sharing data

    Exploring the link between test suite quality and automatic specification inference

    Get PDF
    While no one doubts the importance of correct and complete specifications, many industrial systems still do not have formal specifications written out — and even when they do, it is hard to check their correctness and completeness. This work explores the possibility of using an invariant extraction tool such as Daikon to automatically infer specifications from available test suites with the idea of aiding software engineers to improve the specifications by having another version to compare to. Given that our initial experiments did not produce satisfactory results, in this paper we explore which test suite attributes influence the quality of the inferred specification. Following further study, we found that instruction, branch and method coverage are correlated to high recall values, reaching up to 97.93%.peer-reviewe

    A Machine Learning Enhanced Scheme for Intelligent Network Management

    Get PDF
    The versatile networking services bring about huge influence on daily living styles while the amount and diversity of services cause high complexity of network systems. The network scale and complexity grow with the increasing infrastructure apparatuses, networking function, networking slices, and underlying architecture evolution. The conventional way is manual administration to maintain the large and complex platform, which makes effective and insightful management troublesome. A feasible and promising scheme is to extract insightful information from largely produced network data. The goal of this thesis is to use learning-based algorithms inspired by machine learning communities to discover valuable knowledge from substantial network data, which directly promotes intelligent management and maintenance. In the thesis, the management and maintenance focus on two schemes: network anomalies detection and root causes localization; critical traffic resource control and optimization. Firstly, the abundant network data wrap up informative messages but its heterogeneity and perplexity make diagnosis challenging. For unstructured logs, abstract and formatted log templates are extracted to regulate log records. An in-depth analysis framework based on heterogeneous data is proposed in order to detect the occurrence of faults and anomalies. It employs representation learning methods to map unstructured data into numerical features, and fuses the extracted feature for network anomaly and fault detection. The representation learning makes use of word2vec-based embedding technologies for semantic expression. Next, the fault and anomaly detection solely unveils the occurrence of events while failing to figure out the root causes for useful administration so that the fault localization opens a gate to narrow down the source of systematic anomalies. The extracted features are formed as the anomaly degree coupled with an importance ranking method to highlight the locations of anomalies in network systems. Two types of ranking modes are instantiated by PageRank and operation errors for jointly highlighting latent issue of locations. Besides the fault and anomaly detection, network traffic engineering deals with network communication and computation resource to optimize data traffic transferring efficiency. Especially when network traffic are constrained with communication conditions, a pro-active path planning scheme is helpful for efficient traffic controlling actions. Then a learning-based traffic planning algorithm is proposed based on sequence-to-sequence model to discover hidden reasonable paths from abundant traffic history data over the Software Defined Network architecture. Finally, traffic engineering merely based on empirical data is likely to result in stale and sub-optimal solutions, even ending up with worse situations. A resilient mechanism is required to adapt network flows based on context into a dynamic environment. Thus, a reinforcement learning-based scheme is put forward for dynamic data forwarding considering network resource status, which explicitly presents a promising performance improvement. In the end, the proposed anomaly processing framework strengthens the analysis and diagnosis for network system administrators through synthesized fault detection and root cause localization. The learning-based traffic engineering stimulates networking flow management via experienced data and further shows a promising direction of flexible traffic adjustment for ever-changing environments

    LIFTS: Learning Featured Transition Systems

    Get PDF

    Enhanced feature mining and classifier models to predict customer churn for an e-retailer

    Get PDF
    Customer Churn, an event indicating a customer abandoning an established relation with a business is an important problem researched well both in academic and commercial interest. Through this work, we propose an improved prediction model that emphasizes on an effective data collection pipeline through varied channels capturing explicit and implicit customer footprints. Our goal is to demonstrate how Feature selection algorithms can improve classifier efficiency. We also rank prominent features which play a vital role in customer churn. Our contributions through this paper can be broadly categorized into 3 folds: First, we show how popular data mining tools in Hadoop stack help extract several implicit customer interaction metrics including Sales and Clickstream logs generated as a result of customer interaction. Second, through Feature Engineering techniques we verify that some of the new features we propose have a definite impact on customer churn. Finally, we establish how Regularized Logistic Regression, SVM and Gradient Boost Random Forests are the best performing models for predicting customer churn verified through comprehensive cross-validation techniques

    N-Gram Based Test Sequence Generation from Finite State Models

    Get PDF
    Abstract. Model based testing offers a powerful mechanism to test ap-plications that change dynamically and continuously, for which only some limited black-box knowledge is available (this is typically the case of fu-ture internet applications). Models can be inferred from observations of real executions and test cases can be derived from models, according to various strategies (e.g., graph or random visits). The problem is that a relatively large proportion of the test cases obtained in this way might result to be non executable, because they involve infeasible paths. In this paper, we propose a novel test case derivation strategy, based on the computation of the N-gram statistics. Event sequences are gen-erated for which the subsequences of size N respect the distribution of the N-tuples observed in the execution traces. In this way, generated and observed sequences share the same context (up to length N), hence increasing the likelihood for the generated ones of being actually exe-cutable. A consequence of the increased proportion of feasible test cases is that model coverage is also expected to increase.
    • …
    corecore