176,418 research outputs found
Qualitative Effects of Knowledge Rules in Probabilistic Data Integration
One of the problems in data integration is data overlap: the fact that different data sources have data on the same real world entities. Much development time in data integration projects is devoted to entity resolution. Often advanced similarity measurement techniques are used to remove semantic duplicates from the integration result or solve other semantic conflicts, but it proofs impossible to get rid of all semantic problems in data integration. An often-used rule of thumb states that about 90% of the development effort is devoted to solving the remaining 10% hard cases. In an attempt to significantly decrease human effort at data integration time, we have proposed an approach that stores any remaining semantic uncertainty and conflicts in a probabilistic database enabling it to already be meaningfully used. The main development effort in our approach is devoted to defining and tuning knowledge rules and thresholds. Rules and thresholds directly impact the size and quality of the integration result. We measure integration quality indirectly by measuring the quality of answers to queries on the integrated data set in an information retrieval-like way. The main contribution of this report is an experimental investigation of the effects and sensitivity of rule definition and threshold tuning on the integration quality. This proves that our approach indeed reduces development effort — and not merely shifts the effort to rule definition and threshold tuning — by showing that setting rough safe thresholds and defining only a few rules suffices to produce a ‘good enough’ integration that can be meaningfully used
Failures Pave the Way: Enhancing Large Language Models through Tuning-free Rule Accumulation
Large Language Models (LLMs) have showcased impressive performance. However,
due to their inability to capture relationships among samples, these frozen
LLMs inevitably keep repeating similar mistakes. In this work, we propose our
Tuning-free Rule Accumulation (TRAN) framework, which guides LLMs in improving
their performance by learning from previous mistakes. Considering data arrives
sequentially, LLMs gradually accumulate rules from incorrect cases, forming a
rule collection. These rules are then utilized by the LLMs to avoid making
similar mistakes when processing subsequent inputs. Moreover, the rules remain
independent of the primary prompts, seamlessly complementing prompt design
strategies. Experimentally, we show that TRAN improves over recent baselines by
a large margin.Comment: This paper is accepted by the EMNLP 2023 Main Conferenc
Designing fuzzy rule based classifier using self-organizing feature map for analysis of multispectral satellite images
We propose a novel scheme for designing fuzzy rule based classifier. An SOFM
based method is used for generating a set of prototypes which is used to
generate a set of fuzzy rules. Each rule represents a region in the feature
space that we call the context of the rule. The rules are tuned with respect to
their context. We justified that the reasoning scheme may be different in
different context leading to context sensitive inferencing. To realize context
sensitive inferencing we used a softmin operator with a tunable parameter. The
proposed scheme is tested on several multispectral satellite image data sets
and the performance is found to be much better than the results reported in the
literature.Comment: 23 pages, 7 figure
Self-tuning experience weighted attraction learning in games
Self-tuning experience weighted attraction (EWA) is a one-parameter theory of learning in
games. It addresses a criticism that an earlier model (EWA) has too many parameters, by
fixing some parameters at plausible values and replacing others with functions of experience
so that they no longer need to be estimated. Consequently, it is econometrically simpler
than the popular weighted fictitious play and reinforcement learning models.
The functions of experience which replace free parameters “self-tune” over time, adjusting
in a way that selects a sensible learning rule to capture subjects’ choice dynamics. For
instance, the self-tuning EWA model can turn from a weighted fictitious play into an averaging
reinforcement learning as subjects equilibrate and learn to ignore inferior foregone
payoffs. The theory was tested on seven different games, and compared to the earlier parametric
EWA model and a one-parameter stochastic equilibrium theory (QRE). Self-tuning
EWA does as well as EWA in predicting behavior in new games, even though it has fewer
parameters, and fits reliably better than the QRE equilibrium benchmark
Autoregressive time series prediction by means of fuzzy inference systems using nonparametric residual variance estimation
We propose an automatic methodology framework for short- and long-term prediction of time series by means of fuzzy inference systems. In this methodology, fuzzy techniques and statistical techniques for nonparametric residual variance estimation are combined in order to build autoregressive predictive models implemented as fuzzy inference systems. Nonparametric residual variance estimation plays a key role in driving the identification and learning procedures. Concrete criteria and procedures within the proposed methodology framework are applied to a number of time series prediction problems. The learn from examples method introduced by Wang and Mendel (W&M) is used for identification. The Levenberg–Marquardt (L–M) optimization method is then applied for tuning. The W&M method produces compact and potentially accurate inference systems when applied after a proper variable selection stage. The L–M method yields the best compromise between accuracy and interpretability of results, among a set of alternatives. Delta test based residual variance estimations are used in order to select the best subset of inputs to the fuzzy inference systems as well as the number of linguistic labels for the inputs. Experiments on a diverse set of time series prediction benchmarks are compared against least-squares support vector machines (LS-SVM), optimally pruned extreme learning machine (OP-ELM), and k-NN based autoregressors. The advantages of the proposed methodology are shown in terms of linguistic interpretability, generalization capability and computational cost. Furthermore, fuzzy models are shown to be consistently more accurate for prediction in the case of time series coming from real-world applications.Ministerio de Ciencia e Innovación TEC2008-04920Junta de Andalucía P08-TIC-03674, IAC07-I-0205:33080, IAC08-II-3347:5626
Interpretable multiclass classification by MDL-based rule lists
Interpretable classifiers have recently witnessed an increase in attention
from the data mining community because they are inherently easier to understand
and explain than their more complex counterparts. Examples of interpretable
classification models include decision trees, rule sets, and rule lists.
Learning such models often involves optimizing hyperparameters, which typically
requires substantial amounts of data and may result in relatively large models.
In this paper, we consider the problem of learning compact yet accurate
probabilistic rule lists for multiclass classification. Specifically, we
propose a novel formalization based on probabilistic rule lists and the minimum
description length (MDL) principle. This results in virtually parameter-free
model selection that naturally allows to trade-off model complexity with
goodness of fit, by which overfitting and the need for hyperparameter tuning
are effectively avoided. Finally, we introduce the Classy algorithm, which
greedily finds rule lists according to the proposed criterion. We empirically
demonstrate that Classy selects small probabilistic rule lists that outperform
state-of-the-art classifiers when it comes to the combination of predictive
performance and interpretability. We show that Classy is insensitive to its
only parameter, i.e., the candidate set, and that compression on the training
set correlates with classification performance, validating our MDL-based
selection criterion
- …