Search CORE

3,934 research outputs found

Recommended from our members

Incremental learning of independent, overlapping, and graded concept descriptions with an instance-based process framework

Author: Aha David W.
Publication venue: eScholarship, University of California
Publication date: 23/05/1989
Field of study

Supervised learning algorithms make several simplifying assumptions concerning the characteristics of the concept descriptions to be learned. For example, concepts are often assumed to be (1) defined with respect to the same set of relevant attributes, (2) disjoint in instance space, and (3) have uniform instance distributions. While these assumptions constrain the learning task, they unfortunately limit an algorithm's applicability. We believe that supervised learning algorithms should learn attribute relevancies independently for each concept, allow instances to be members of any subset of concepts, and represent graded concept descriptions. This paper introduces a process framework for instance-based learning algorithms that exploit only specific instance and performance feedback information to guide their concept learning processes. We also introduce Bloom, a specific instantiation of this framework. Bloom is a supervised, incremental, instance-based learning algorithm that learns relative attribute relevancies independently for each concept, allows instances to be members of any subset of concepts, and represents graded concept memberships. We describe empirical evidence to support our claims that Bloom can learn independent, overlapping, and graded concept descriptions

eScholarship - University of California

Developments in the theory of randomized shortest paths with a comparison of graph node distances

Author: Kivimäki Ilkka
Saerens Marco
Shimbo Masashi
Publication venue: 'Elsevier BV'
Publication date: 03/10/2013
Field of study

There have lately been several suggestions for parametrized distances on a graph that generalize the shortest path distance and the commute time or resistance distance. The need for developing such distances has risen from the observation that the above-mentioned common distances in many situations fail to take into account the global structure of the graph. In this article, we develop the theory of one family of graph node distances, known as the randomized shortest path dissimilarity, which has its foundation in statistical physics. We show that the randomized shortest path dissimilarity can be easily computed in closed form for all pairs of nodes of a graph. Moreover, we come up with a new definition of a distance measure that we call the free energy distance. The free energy distance can be seen as an upgrade of the randomized shortest path dissimilarity as it defines a metric, in addition to which it satisfies the graph-geodetic property. The derivation and computation of the free energy distance are also straightforward. We then make a comparison between a set of generalized distances that interpolate between the shortest path distance and the commute time, or resistance distance. This comparison focuses on the applicability of the distances in graph node clustering and classification. The comparison, in general, shows that the parametrized distances perform well in the tasks. In particular, we see that the results obtained with the free energy distance are among the best in all the experiments.Comment: 30 pages, 4 figures, 3 table

arXiv.org e-Print Archive

DIAL UCLouvain

One-Class Classification: Taxonomy of Study and Review of Techniques

Author: Khan Shehroz S.
Madden Michael G.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 29/11/2013
Field of study

One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

arXiv.org e-Print Archive

Access to Research at National University of Ireland, Galway

Ward's Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm

Author: AK JAIN
B ROUX LE
D WISHART
F MURTAGH
F MURTAGH
F MURTAGH
F MURTAGH
Fionn Murtagh
GJ SZÉKELY
GN LANCE
JC GOWER
JH WARD
JP BENZÉCRI
L KAUFMAN
L ORLÓCI
M BRUYNOOGHE
M JAMBU
M JAMBU
MR ANDERBERG
P LEGENDRE
P LEGENDRE
Pierre Legendre
RA FISHER
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/12/2011
Field of study

The Ward error sum of squares hierarchical clustering method has been very widely used since its first description by Ward in a 1963 publication. It has also been generalized in various ways. However there are different interpretations in the literature and there are different implementations of the Ward agglomerative algorithm in commonly used software systems, including differing expressions of the agglomerative criterion. Our survey work and case studies will be useful for all those involved in developing software for data analysis using Ward's hierarchical clustering method.Comment: 20 pages, 21 citations, 4 figure

arXiv.org e-Print Archive

Goldsmiths Research Online

Crossref

De Montfort University Open Research Archive

Forecasting loss given default with the nearest neighbor algorithm

Author: Moura Telmo Correia de Pina e
Publication venue: Instituto Superior de Economia e Gestão
Publication date: 01/01/2012
Field of study

Mestrado em Matemática FinanceiraNos últimos anos, a previsão do Loss Given Default (LGD) tem sido um dos principais desafios no âmbito da gestão do risco de crédito. Investigadores académicos e profissionais da indústria bancária têm-se dedicado ao estudo deste parâmetro de risco em particular. Apesar de todas as diferentes abordagens já desenvolvidas e publicadas até hoje, a previsão do LGD continua a ser um tema de estudo académico intenso e sobre o qual ainda não existe um "consenso" metodológico na banca. Este trabalho apresenta uma abordagem alternativa para a previsão do LGD baseada na utilização de um simples, mas intuitivo, algoritmo de Machine Learning: o algoritmo nearest neighbor. De forma a avaliar a perfomance desta técnica não paramétrica na previsão do LGD, são utilizadas determinadas métricas de avaliação que permitem a comparação com um modelo paramétrico mais convencional e com a utilização do LGD médio histórico.In recent years, forecasting Loss Given Default (LGD) has been a major challenge in the field of credit risk management. Practitioners and academic researchers have focused on the study of this particular risk dimension. Despite all different approaches that have been developed and published so far, it remains an area of intense academic study and with lack of consensual solutions in the banking industry. This paper presents an LGD forecasting approach based on a simple and intuitive Machine Learning algorithm: the nearest neighbor algorithm. In order to evaluate the performance of this non parametric technique, some proper evaluation metrics are used to compare it to a more ?classical? parametric model and to the use of historical recovery rates to predict LGD

UTL Repository

Survey of data mining approaches to user modeling for adaptive hypermedia

Author: Chen SY
Frias-Martinez E
Liu X
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

The ability of an adaptive hypermedia system to create tailored environments depends mainly on the amount and accuracy of information stored in each user model. Some of the difficulties that user modeling faces are the amount of data available to create user models, the adequacy of the data, the noise within that data, and the necessity of capturing the imprecise nature of human behavior. Data mining and machine learning techniques have the ability to handle large amounts of data and to process uncertainty. These characteristics make these techniques suitable for automatic generation of user models that simulate human decision making. This paper surveys different data mining techniques that can be used to efficiently and accurately capture user behavior. The paper also presents guidelines that show which techniques may be used more efficiently according to the task implemented by the applicatio

CiteSeerX

Crossref

Brunel University Research Archive