82,168 research outputs found
Complexity modelling for case knowledge maintenance in case-based reasoning.
Case-based reasoning solves new problems by re-using the solutions of previously solved similar problems and is popular because many of the knowledge engineering demands of conventional knowledge-based systems are removed. The content of the case knowledge container is critical to the performance of case-based classification systems. However, the knowledge engineer is given little support in the selection of suitable techniques to maintain and monitor the case base. This research investigates the coverage, competence and problem-solving capacity of case knowledge with the aim of developing techniques to model and maintain the case base. We present a novel technique that creates a model of the case base by measuring the uncertainty in local areas of the problem space based on the local mix of solutions present. The model provides an insight into the structure of a case base by means of a complexity profile that can assist maintenance decision-making and provide a benchmark to assess future changes to the case base. The distribution of cases in the case base is critical to the performance of a case-based reasoning system. We argue that classification boundaries represent important regions of the problem space and develop two complexity-guided algorithms which use boundary identification techniques to actively discover cases close to boundaries. We introduce a complexity-guided redundancy reduction algorithm which uses a case complexity threshold to retain cases close to boundaries and delete cases that form single class clusters. The algorithm offers control over the balance between maintaining competence and reducing case base size. The performance of a case-based reasoning system relies on the integrity of its case base but in real life applications the available data invariably contains erroneous, noisy cases. Automated removal of these noisy cases can improve system accuracy. In addition, error rates can often be reduced by removing cases to give smoother decision boundaries between classes. We show that the optimal level of boundary smoothing is domain dependent and, therefore, our approach to error reduction reacts to the characteristics of the domain by setting an appropriate level of smoothing. We introduce a novel algorithm which identifies and removes both noisy and boundary cases with the aid of a local distance ratio. A prototype interface has been developed that shows how the modelling and maintenance approaches can be used in practice in an interactive manner. The interface allows the knowledge engineer to make informed maintenance choices without the need for extensive evaluation effort while, at the same time, retaining control over the process. One of the strengths of our approach is in applying a consistent, integrated method to case base maintenance to provide a transparent process that gives a degree of explanation
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
How to Find More Supernovae with Less Work: Object Classification Techniques for Difference Imaging
We present the results of applying new object classification techniques to
difference images in the context of the Nearby Supernova Factory supernova
search. Most current supernova searches subtract reference images from new
images, identify objects in these difference images, and apply simple threshold
cuts on parameters such as statistical significance, shape, and motion to
reject objects such as cosmic rays, asteroids, and subtraction artifacts.
Although most static objects subtract cleanly, even a very low false positive
detection rate can lead to hundreds of non-supernova candidates which must be
vetted by human inspection before triggering additional followup. In comparison
to simple threshold cuts, more sophisticated methods such as Boosted Decision
Trees, Random Forests, and Support Vector Machines provide dramatically better
object discrimination. At the Nearby Supernova Factory, we reduced the number
of non-supernova candidates by a factor of 10 while increasing our supernova
identification efficiency. Methods such as these will be crucial for
maintaining a reasonable false positive rate in the automated transient alert
pipelines of upcoming projects such as PanSTARRS and LSST.Comment: 25 pages; 6 figures; submitted to Ap
On a continuation approach in Tikhonov regularization and its application in piecewise-constant parameter identification
We present a new approach to convexification of the Tikhonov regularization
using a continuation method strategy. We embed the original minimization
problem into a one-parameter family of minimization problems. Both the penalty
term and the minimizer of the Tikhonov functional become dependent on a
continuation parameter.
In this way we can independently treat two main roles of the regularization
term, which are stabilization of the ill-posed problem and introduction of the
a priori knowledge. For zero continuation parameter we solve a relaxed
regularization problem, which stabilizes the ill-posed problem in a weaker
sense. The problem is recast to the original minimization by the continuation
method and so the a priori knowledge is enforced.
We apply this approach in the context of topology-to-shape geometry
identification, where it allows to avoid the convergence of gradient-based
methods to a local minima. We present illustrative results for magnetic
induction tomography which is an example of PDE constrained inverse problem
Semi-supervised model-based clustering with controlled clusters leakage
In this paper, we focus on finding clusters in partially categorized data
sets. We propose a semi-supervised version of Gaussian mixture model, called
C3L, which retrieves natural subgroups of given categories. In contrast to
other semi-supervised models, C3L is parametrized by user-defined leakage
level, which controls maximal inconsistency between initial categorization and
resulting clustering. Our method can be implemented as a module in practical
expert systems to detect clusters, which combine expert knowledge with true
distribution of data. Moreover, it can be used for improving the results of
less flexible clustering techniques, such as projection pursuit clustering. The
paper presents extensive theoretical analysis of the model and fast algorithm
for its efficient optimization. Experimental results show that C3L finds high
quality clustering model, which can be applied in discovering meaningful groups
in partially classified data
- …