82,168 research outputs found

    Complexity modelling for case knowledge maintenance in case-based reasoning.

    Get PDF
    Case-based reasoning solves new problems by re-using the solutions of previously solved similar problems and is popular because many of the knowledge engineering demands of conventional knowledge-based systems are removed. The content of the case knowledge container is critical to the performance of case-based classification systems. However, the knowledge engineer is given little support in the selection of suitable techniques to maintain and monitor the case base. This research investigates the coverage, competence and problem-solving capacity of case knowledge with the aim of developing techniques to model and maintain the case base. We present a novel technique that creates a model of the case base by measuring the uncertainty in local areas of the problem space based on the local mix of solutions present. The model provides an insight into the structure of a case base by means of a complexity profile that can assist maintenance decision-making and provide a benchmark to assess future changes to the case base. The distribution of cases in the case base is critical to the performance of a case-based reasoning system. We argue that classification boundaries represent important regions of the problem space and develop two complexity-guided algorithms which use boundary identification techniques to actively discover cases close to boundaries. We introduce a complexity-guided redundancy reduction algorithm which uses a case complexity threshold to retain cases close to boundaries and delete cases that form single class clusters. The algorithm offers control over the balance between maintaining competence and reducing case base size. The performance of a case-based reasoning system relies on the integrity of its case base but in real life applications the available data invariably contains erroneous, noisy cases. Automated removal of these noisy cases can improve system accuracy. In addition, error rates can often be reduced by removing cases to give smoother decision boundaries between classes. We show that the optimal level of boundary smoothing is domain dependent and, therefore, our approach to error reduction reacts to the characteristics of the domain by setting an appropriate level of smoothing. We introduce a novel algorithm which identifies and removes both noisy and boundary cases with the aid of a local distance ratio. A prototype interface has been developed that shows how the modelling and maintenance approaches can be used in practice in an interactive manner. The interface allows the knowledge engineer to make informed maintenance choices without the need for extensive evaluation effort while, at the same time, retaining control over the process. One of the strengths of our approach is in applying a consistent, integrated method to case base maintenance to provide a transparent process that gives a degree of explanation

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    How to Find More Supernovae with Less Work: Object Classification Techniques for Difference Imaging

    Get PDF
    We present the results of applying new object classification techniques to difference images in the context of the Nearby Supernova Factory supernova search. Most current supernova searches subtract reference images from new images, identify objects in these difference images, and apply simple threshold cuts on parameters such as statistical significance, shape, and motion to reject objects such as cosmic rays, asteroids, and subtraction artifacts. Although most static objects subtract cleanly, even a very low false positive detection rate can lead to hundreds of non-supernova candidates which must be vetted by human inspection before triggering additional followup. In comparison to simple threshold cuts, more sophisticated methods such as Boosted Decision Trees, Random Forests, and Support Vector Machines provide dramatically better object discrimination. At the Nearby Supernova Factory, we reduced the number of non-supernova candidates by a factor of 10 while increasing our supernova identification efficiency. Methods such as these will be crucial for maintaining a reasonable false positive rate in the automated transient alert pipelines of upcoming projects such as PanSTARRS and LSST.Comment: 25 pages; 6 figures; submitted to Ap

    On a continuation approach in Tikhonov regularization and its application in piecewise-constant parameter identification

    Full text link
    We present a new approach to convexification of the Tikhonov regularization using a continuation method strategy. We embed the original minimization problem into a one-parameter family of minimization problems. Both the penalty term and the minimizer of the Tikhonov functional become dependent on a continuation parameter. In this way we can independently treat two main roles of the regularization term, which are stabilization of the ill-posed problem and introduction of the a priori knowledge. For zero continuation parameter we solve a relaxed regularization problem, which stabilizes the ill-posed problem in a weaker sense. The problem is recast to the original minimization by the continuation method and so the a priori knowledge is enforced. We apply this approach in the context of topology-to-shape geometry identification, where it allows to avoid the convergence of gradient-based methods to a local minima. We present illustrative results for magnetic induction tomography which is an example of PDE constrained inverse problem

    Semi-supervised model-based clustering with controlled clusters leakage

    Full text link
    In this paper, we focus on finding clusters in partially categorized data sets. We propose a semi-supervised version of Gaussian mixture model, called C3L, which retrieves natural subgroups of given categories. In contrast to other semi-supervised models, C3L is parametrized by user-defined leakage level, which controls maximal inconsistency between initial categorization and resulting clustering. Our method can be implemented as a module in practical expert systems to detect clusters, which combine expert knowledge with true distribution of data. Moreover, it can be used for improving the results of less flexible clustering techniques, such as projection pursuit clustering. The paper presents extensive theoretical analysis of the model and fast algorithm for its efficient optimization. Experimental results show that C3L finds high quality clustering model, which can be applied in discovering meaningful groups in partially classified data
    corecore