5 research outputs found
Learning Tractable Probabilistic Models for Fault Localization
In recent years, several probabilistic techniques have been applied to
various debugging problems. However, most existing probabilistic debugging
systems use relatively simple statistical models, and fail to generalize across
multiple programs. In this work, we propose Tractable Fault Localization Models
(TFLMs) that can be learned from data, and probabilistically infer the location
of the bug. While most previous statistical debugging methods generalize over
many executions of a single program, TFLMs are trained on a corpus of
previously seen buggy programs, and learn to identify recurring patterns of
bugs. Widely-used fault localization techniques such as TARANTULA evaluate the
suspiciousness of each line in isolation; in contrast, a TFLM defines a joint
probability distribution over buggy indicator variables for each line. Joint
distributions with rich dependency structure are often computationally
intractable; TFLMs avoid this by exploiting recent developments in tractable
probabilistic models (specifically, Relational SPNs). Further, TFLMs can
incorporate additional sources of information, including coverage-based
features such as TARANTULA. We evaluate the fault localization performance of
TFLMs that include TARANTULA scores as features in the probabilistic model. Our
study shows that the learned TFLMs isolate bugs more effectively than previous
statistical methods or using TARANTULA directly.Comment: Fifth International Workshop on Statistical Relational AI (StaR-AI
2015
User Behavior-Based Implicit Authentication
In this work, we proposed dynamic retraining (RU), wind vane module (WVM), BubbleMap (BMap), and reinforcement authentication (RA) to improve the efficacy of implicit authentication (IA). Motivated by the great potential of implicit and seamless user authentication, we have built an implicit authentication system with adaptive sampling that automatically selects dynamic sets of activities for user behavior extraction. Various activities, such as user location, application usage, user motion, and battery usage have been popular choices to generate behaviors, the soft biometrics, for implicit authentication. Unlike password-based or hard biometric-based authentication, implicit authentication does not require explicit user action or expensive hardware. However, user behaviors can change unpredictably, which renders it more challenging to develop systems that depend on them. In addition to dynamic behavior extraction, the proposed implicit authentication system differs from the existing systems in terms of energy efficiency for battery-powered mobile devices. Since implicit authentication systems rely on machine learning, the expensive training process needs to be outsourced to the remote server. However, mobile devices may not always have reliable network connections to send real-time data to the server for training. In addition, IA systems are still at their infancy and exhibit many limitations, one of which is how to determine the best retraining frequency when updating the user behavior model. Another limitation is how to gracefully degrade user privilege when authentication fails to identify legitimate users (i.e., false negatives) for a practical IA system.To address the retraining problem, we proposed an algorithm that utilizes Jensen-Shannon (JS)-dis(tance) to determine the optimal retraining frequency, which is discussed in Chapter 2. We overcame the limitation of traditional IA by proposing a W-layer, an overlay that provides a practical and energy-efficient solution for implicit authentication on mobile devices. The W-layer is discussed in Chapter 3 and 4. In Chapter 5, a novel privilege-control mechanism, BubbleMap (BMap), is introduced to provide fine-grained privileges to users based on their behavioral scores. In the same chapter, we describe reinforcement authentication (RA) to achieve a more reliable authentication
Statistical Debugging using Latent Topic Models β
Abstract. Statistical debugging uses machine learning to model program failures and help identify root causes of bugs. We approach this task using a novel Delta-Latent-Dirichlet-Allocation model. We model execution traces attributed to failed runs of a program as being generated by two types of latent topics: normal usage topics and bug topics. Execution traces attributed to successful runs of the same program, however, are modeled by usage topics only. Joint modeling of both kinds of traces allows us to identify weak bug topics that would otherwise remain undetected. We perform model inference with collapsed Gibbs sampling. In quantitative evaluations on four real programs, our model produces bug topics highly correlated to the true bugs, as measured by the Rand index. Qualitative evaluation by domain experts suggests that our model outperforms existing statistical methods for bug cause identification, and may help support other software tasks not addressed by earlier models.