72,161 research outputs found

    The More the Merrier: Leveraging on the Bug Inflow to Guide Software Maintenance

    Get PDF
    Issue management, a central part of software maintenance, requires much effort for complex software systems. The continuous inflow of issue reports makes it hard for developers to stay on top of the situation, and the threatening information overload makes activities such as duplicate management, Issue Assignment (IA), and Change Impact Analysis (CIA) tedious and error-prone. Still, most practitioners work with tools that act as little more than issue containers. Machine Learning encompasses approaches that identify patterns or make predictions based on empirical data. While humans have limited ability to work with big data, ML instead tends to improve the more training data that is available. Consequently, we argue that the challenge of information overload in issue management appears to be particularly suitable for ML-based tool support. While others have initially explored the area, we develop two ML-based tools, and evaluate them in proprietary software engineering contexts. We replicated [1] for five projects in two companies, and our automated IA obtains an accuracy matching the current manual processes. Thus, as our solution delivers instantaneous IA, an organization can potentially save considerable analysis effort. Moreover, for the most comprehensive of the five projects, we implemented automated CIA in the tool ImpRec [3]. We evaluated the tool in a longitudinal in situ study, i.e., deployment in two development teams in industry. Based on log analysis and complementary interviews using the QUPER model [2] for utility assessment, we conclude that ImpRec offered helpful support in the CIA task

    Machine Learning in Automated Text Categorization

    Full text link
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

    Data-Driven Application Maintenance: Views from the Trenches

    Full text link
    In this paper we present our experience during design, development, and pilot deployments of a data-driven machine learning based application maintenance solution. We implemented a proof of concept to address a spectrum of interrelated problems encountered in application maintenance projects including duplicate incident ticket identification, assignee recommendation, theme mining, and mapping of incidents to business processes. In the context of IT services, these problems are frequently encountered, yet there is a gap in bringing automation and optimization. Despite long-standing research around mining and analysis of software repositories, such research outputs are not adopted well in practice due to the constraints these solutions impose on the users. We discuss need for designing pragmatic solutions with low barriers to adoption and addressing right level of complexity of problems with respect to underlying business constraints and nature of data.Comment: Earlier version of paper appearing in proceedings of the 4th International Workshop on Software Engineering Research and Industrial Practice (SER&IP), IEEE Press, pp. 48-54, 201

    Rational bidding using reinforcement learning: an application in automated resource allocation

    Get PDF
    The application of autonomous agents by the provisioning and usage of computational resources is an attractive research field. Various methods and technologies in the area of artificial intelligence, statistics and economics are playing together to achieve i) autonomic resource provisioning and usage of computational resources, to invent ii) competitive bidding strategies for widely used market mechanisms and to iii) incentivize consumers and providers to use such market-based systems. The contributions of the paper are threefold. First, we present a framework for supporting consumers and providers in technical and economic preference elicitation and the generation of bids. Secondly, we introduce a consumer-side reinforcement learning bidding strategy which enables rational behavior by the generation and selection of bids. Thirdly, we evaluate and compare this bidding strategy against a truth-telling bidding strategy for two kinds of market mechanisms – one centralized and one decentralized

    A review of the state of the art in Machine Learning on the Semantic Web: Technical Report CSTR-05-003

    Get PDF

    Toward Optimal Feature Selection in Naive Bayes for Text Categorization

    Full text link
    Automated feature selection is important for text categorization to reduce the feature size and to speed up the learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties relating to type I and type II errors of a Bayesian classifier. We then introduce a new divergence measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure multi-distribution divergence for multi-class classification. Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination (MDMD) and MD−χ2MD-\chi^2 methods, for text categorization. The promising results of extensive experiments demonstrate the effectiveness of the proposed approaches.Comment: This paper has been submitted to the IEEE Trans. Knowledge and Data Engineering. 14 pages, 5 figure
    • …
    corecore