2,613 research outputs found
Applying Associative Classifier PGN for Digitised Cultural Heritage Resource Discovery
Resource discovery is one of the key services in digitised cultural
heritage collections. It requires intelligent mining in heterogeneous digital
content as well as capabilities in large scale performance; this explains the
recent advances in classification methods. Associative classifiers are convenient
data mining tools used in the field of cultural heritage, by applying their
possibilities to taking into account the specific combinations of the attribute
values. Usually, the associative classifiers prioritize the support over the
confidence. The proposed classifier PGN questions this common approach and
focuses on confidence first by retaining only 100% confidence rules. The
classification tasks in the field of cultural heritage usually deal with data sets
with many class labels. This variety is caused by the richness of accumulated
culture during the centuries. Comparisons of classifier PGN with other
classifiers, such as OneR, JRip and J48, show the competitiveness of PGN in
recognizing multi-class datasets on collections of masterpieces from different
West and East European Fine Art authors and movements
Transfer learning through greedy subset selection
We study the binary transfer learning problem, focusing on how to select sources from a large pool and how to combine them to yield a good performance on a target task. In particular, we consider the transfer learning setting where one does not have direct access to the source data, but rather employs the source hypotheses trained from them. Building on the literature on the best subset selection problem, we propose an efficient algorithm that selects relevant source hypotheses and feature dimensions simultaneously. On three computer vision datasets we achieve state-of-the-art results, substantially outperforming transfer learning and popular feature selection baselines in a small-sample setting. Also, we theoretically prove that, under reasonable assumptions on the source hypotheses, our algorithm can learn effectively from few examples
I-prune: Item selection for associative classification
Associative classification is characterized by accurate models and high model generation time. Most time is spent in extracting and postprocessing a large set of irrelevant rules, which are eventually pruned.We propose I-prune, an item-pruning approach that selects uninteresting items by means of an interestingness measure and prunes them as soon as they are detected. Thus, the number of extracted rules is reduced and model generation time decreases correspondingly. A wide set of experiments on real and synthetic data sets has been performed to evaluate I-prune and select the appropriate interestingness measure. The experimental results show that I-prune allows a significant reduction in model generation time, while increasing (or at worst preserving) model accuracy. Experimental evaluation also points to the chi-square measure as the most effective interestingness measure for item pruning
Highlighter: automatic highlighting of electronic learning documents
Electronic textual documents are among the most popular teaching content accessible through e-learning platforms. Teachers or learners with different levels of knowledge can access the platform and highlight portions of textual content which are deemed as particularly relevant. The highlighted documents can be shared with the learning community in support of oral lessons or individual learning. However, highlights are often incomplete or unsuitable for learners with different levels of knowledge. This paper addresses the problem of predicting new highlights of partly highlighted electronic learning documents. With the goal of enriching teaching content with additional features, text classification techniques are exploited to automatically analyze portions of documents enriched with manual highlights made by users with different levels of knowledge and to generate ad hoc prediction models. Then, the generated models are applied to the remaining content to suggest highlights. To improve the quality of the learning experience, learners may explore highlights generated by models tailored to different levels of knowledge. We tested the prediction system on real and benchmark documents highlighted by domain experts and we compared the performance of various classifiers in generating highlights. The achieved results demonstrated the high accuracy of the predictions and the applicability of the proposed approach to real teaching documents
Can humain association norm evaluate latent semantic analysis?
This paper presents the comparison of word association norm created by a psycholinguistic experiment to association lists generated by algorithms operating on text corpora. We compare lists generated by Church and Hanks algorithm and lists generated by LSA algorithm. An argument is presented on how those automatically generated lists reflect real semantic relations
Semantic 3D scene interpretation: A framework combining optimal neighborhood size selection with relevant features
3D scene analysis by automatically assigning 3D points a semantic label has become an issue of major interest in recent years. Whereas the tasks of feature extraction and classification have been in the focus of research, the idea of using only relevant and more distinctive features extracted from optimal 3D neighborhoods has only rarely been addressed in 3D lidar data processing. In this paper, we focus on the interleaved issue of extracting relevant, but not redundant features and increasing their distinctiveness by considering the respective optimal 3D neighborhood of each individual 3D point. We present a new, fully automatic and versatile framework consisting of four successive steps: (i) optimal neighborhood size selection, (ii) feature extraction, (iii) feature selection, and (iv) classification. In a detailed evaluation which involves 5 different neighborhood definitions, 21 features, 6 approaches for feature subset selection and 2 different classifiers, we demonstrate that optimal neighborhoods for individual 3D points significantly improve the results of scene interpretation and that the selection of adequate feature subsets may even further increase the quality of the derived results
MPGN – An Approach for Discovering Class Association Rules
his article presents some of the results of the Ph.D. thesis Class Association Rule Mining
Using MultiDimensional Numbered Information Spaces by Iliya Mitov (Institute of Mathematics
and Informatics, BAS), successfully defended at Hasselt University, Faculty of Science on 15
November 2011 in BelgiumThe article briefly presents some results achieved within the PhD project R1876Intelligent Systems’ Memory Structuring Using Multidimensional Numbered Information Spaces, successfully defended at Hasselt University. The main goal of this article is to show the possibilities of using multidimensional numbered information spaces in data mining processes on the example of the implementation of one associative classifier, called MPGN (Multilayer Pyramidal Growing Networks)
Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition
In this work we present a framework for the recognition of natural scene
text. Our framework does not require any human-labelled data, and performs word
recognition on the whole image holistically, departing from the character based
recognition systems of the past. The deep neural network models at the centre
of this framework are trained solely on data produced by a synthetic text
generation engine -- synthetic data that is highly realistic and sufficient to
replace real data, giving us infinite amounts of training data. This excess of
data exposes new possibilities for word recognition models, and here we
consider three models, each one "reading" words in a different way: via 90k-way
dictionary encoding, character sequence encoding, and bag-of-N-grams encoding.
In the scenarios of language based and completely unconstrained text
recognition we greatly improve upon state-of-the-art performance on standard
datasets, using our fast, simple machinery and requiring zero data-acquisition
costs
Scalable Greedy Algorithms for Transfer Learning
In this paper we consider the binary transfer learning problem, focusing on
how to select and combine sources from a large pool to yield a good performance
on a target task. Constraining our scenario to real world, we do not assume the
direct access to the source data, but rather we employ the source hypotheses
trained from them. We propose an efficient algorithm that selects relevant
source hypotheses and feature dimensions simultaneously, building on the
literature on the best subset selection problem. Our algorithm achieves
state-of-the-art results on three computer vision datasets, substantially
outperforming both transfer learning and popular feature selection baselines in
a small-sample setting. We also present a randomized variant that achieves the
same results with the computational cost independent from the number of source
hypotheses and feature dimensions. Also, we theoretically prove that, under
reasonable assumptions on the source hypotheses, our algorithm can learn
effectively from few examples
- …