47,433 research outputs found

    An application of machine learning to the organization of institutional software repositories

    Get PDF
    Software reuse has become a major goal in the development of space systems, as a recent NASA-wide workshop on the subject made clear. The Data Systems Technology Division of Goddard Space Flight Center has been working on tools and techniques for promoting reuse, in particular in the development of satellite ground support software. One of these tools is the Experiment in Libraries via Incremental Schemata and Cobweb (ElvisC). ElvisC applies machine learning to the problem of organizing a reusable software component library for efficient and reliable retrieval. In this paper we describe the background factors that have motivated this work, present the design of the system, and evaluate the results of its application

    On Cognitive Preferences and the Plausibility of Rule-based Models

    Get PDF
    It is conventional wisdom in machine learning and data mining that logical models such as rule sets are more interpretable than other models, and that among such rule-based models, simpler models are more interpretable than more complex ones. In this position paper, we question this latter assumption by focusing on one particular aspect of interpretability, namely the plausibility of models. Roughly speaking, we equate the plausibility of a model with the likeliness that a user accepts it as an explanation for a prediction. In particular, we argue that, all other things being equal, longer explanations may be more convincing than shorter ones, and that the predominant bias for shorter models, which is typically necessary for learning powerful discriminative models, may not be suitable when it comes to user acceptance of the learned models. To that end, we first recapitulate evidence for and against this postulate, and then report the results of an evaluation in a crowd-sourcing study based on about 3.000 judgments. The results do not reveal a strong preference for simple rules, whereas we can observe a weak preference for longer rules in some domains. We then relate these results to well-known cognitive biases such as the conjunction fallacy, the representative heuristic, or the recogition heuristic, and investigate their relation to rule length and plausibility.Comment: V4: Another rewrite of section on interpretability to clarify focus on plausibility and relation to interpretability, comprehensibility, and justifiabilit

    A knowledge-based system with learning for computer communication network design

    Get PDF
    Computer communication network design is well-known as complex and hard. For that reason, the most effective methods used to solve it are heuristic. Weaknesses of these techniques are listed and a new approach based on artificial intelligence for solving this problem is presented. This approach is particularly recommended for large packet switched communication networks, in the sense that it permits a high degree of reliability and offers a very flexible environment dealing with many relevant design parameters such as link cost, link capacity, and message delay

    Knowledge-based design support and inductive learning

    Get PDF
    Designing and learning are closely related activities in that design as an ill-structure problem involves identifying the problem of the design as well as finding its solutions. A knowledge-based design support system should support learning by capturing and reusing design knowledge. This thesis addresses two fundamental problems in computational support to design activities: the development of an intelligent design support system architecture and the integration of inductive learning techniques in this architecture.This research is motivated by the belief that (1) the early stage of the design process can be modelled as an incremental learning process in which the structure of a design problem or the product data model of an artefact is developed using inductive learning techniques, and (2) the capability of a knowledge-based design support system can be enhanced by accumulating and storing reusable design product and process information.In order to incorporate inductive learning techniques into a knowledge-based design model and an integrated knowledge-based design support system architecture, the computational techniques for developing a knowledge-based design support system architecture and the role of inductive learning in Al-based design are investigated. This investigation gives a background to the development of an incremental learning model for design suitable for a class of design tasks whose structures are not well known initially.This incremental learning model for design is used as a basis to develop a knowledge-based design support system architecture that can be used as a kernel for knowledge-based design applications. This architecture integrates a number of computational techniques to support the representation and reasoning of design knowledge. In particular, it integrates a blackboard control system with an assumption-based truth maintenance system in an object-oriented environment to support the exploration of multiple design solutions by supporting the exploration and management of design contexts.As an integral part of this knowledge-based design support architecture, a design concept learning system utilising a number of unsupervised inductive learning techniques is developed. This design concept learning system combines concept formation techniques with design heuristics as background knowledge to build a design concept tree from raw data or past design examples. The design concept tree is used as a conceptual structure for the exploration of new designs.The effectiveness of this knowledge-based design support architecture and the design concept learning system is demonstrated through a realistic design domain, the design of small-molecule drugs one of the key tasks of which is to identify a pharmacophore description (the structure of a design problem) from known molecule examples.In this thesis, knowledge-based design and inductive learning techniques are first reviewed. Based on this review, an incremental learning model and an integrated architecture for intelligent design support are presented. The implementation of this architecture and a design concept learning system is then described. The application of the architecture and the design concept learning system in the domain of small-molecule drug design is then discussed. The evaluation of the architecture and the design concept learning system within and beyond this particular domain, and future research directions are finally discussed

    Machine learning and its applications in reliability analysis systems

    Get PDF
    In this thesis, we are interested in exploring some aspects of Machine Learning (ML) and its application in the Reliability Analysis systems (RAs). We begin by investigating some ML paradigms and their- techniques, go on to discuss the possible applications of ML in improving RAs performance, and lastly give guidelines of the architecture of learning RAs. Our survey of ML covers both levels of Neural Network learning and Symbolic learning. In symbolic process learning, five types of learning and their applications are discussed: rote learning, learning from instruction, learning from analogy, learning from examples, and learning from observation and discovery. The Reliability Analysis systems (RAs) presented in this thesis are mainly designed for maintaining plant safety supported by two functions: risk analysis function, i.e., failure mode effect analysis (FMEA) ; and diagnosis function, i.e., real-time fault location (RTFL). Three approaches have been discussed in creating the RAs. According to the result of our survey, we suggest currently the best design of RAs is to embed model-based RAs, i.e., MORA (as software) in a neural network based computer system (as hardware). However, there are still some improvement which can be made through the applications of Machine Learning. By implanting the 'learning element', the MORA will become learning MORA (La MORA) system, a learning Reliability Analysis system with the power of automatic knowledge acquisition and inconsistency checking, and more. To conclude our thesis, we propose an architecture of La MORA

    Machine Learning in Automated Text Categorization

    Full text link
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

    Learning rules from multisource data for cardiac monitoring

    Get PDF
    International audienceThis paper formalises the concept of learning symbolic rules from multisource data in a cardiac monitoring context. Our sources, electrocardiograms and arterial blood pressure measures, describe cardiac behaviours from different viewpoints. To learn interpretable rules, we use an Inductive Logic Programming (ILP) method. We develop an original strategy to cope with the dimensionality issues caused by using this ILP technique on a rich multisource language. The results show that our method greatly improves the feasibility and the efficiency of the process while staying accurate. They also confirm the benefits of using multiple sources to improve the diagnosis of cardiac arrhythmias

    Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction

    Full text link
    For large, real-world inductive learning problems, the number of training examples often must be limited due to the costs associated with procuring, preparing, and storing the training examples and/or the computational costs associated with learning from them. In such circumstances, one question of practical importance is: if only n training examples can be selected, in what proportion should the classes be represented? In this article we help to answer this question by analyzing, for a fixed training-set size, the relationship between the class distribution of the training data and the performance of classification trees induced from these data. We study twenty-six data sets and, for each, determine the best class distribution for learning. The naturally occurring class distribution is shown to generally perform well when classifier performance is evaluated using undifferentiated error rate (0/1 loss). However, when the area under the ROC curve is used to evaluate classifier performance, a balanced distribution is shown to perform well. Since neither of these choices for class distribution always generates the best-performing classifier, we introduce a budget-sensitive progressive sampling algorithm for selecting training examples based on the class associated with each example. An empirical analysis of this algorithm shows that the class distribution of the resulting training set yields classifiers with good (nearly-optimal) classification performance
    corecore