Search CORE

51 research outputs found

Mining Characteristic Patterns for Comparative Music Corpus Analysis

Author: Conklin Darrell
Neubarth Kerstin
Publication venue: 'MDPI AG'
Publication date: 14/03/2020
Field of study

A core issue of computational pattern mining is the identification of interesting patterns. When mining music corpora organized into classes of songs, patterns may be of interest because they are characteristic, describing prevalent properties of classes, or because they are discriminant, capturing distinctive properties of classes. Existing work in computational music corpus analysis has focused on discovering discriminant patterns. This paper studies characteristic patterns, investigating the behavior of different pattern interestingness measures in balancing coverage and discriminability of classes in top k pattern mining and in individual top ranked patterns. Characteristic pattern mining is applied to the collection of Native American music by Frances Densmore, and the discovered patterns are shown to be supported by Densmore’s own analyses

Multidisciplinary Digital Publishing Institute

Archivo Digital para la Docencia y la Investigación

Training spamassassin with active semi-supervised learning

Author: Fabio Roli
Giorgio Fumera
Jun-ming Xu
Zhi-hua Zhou
Publication venue
Publication date: 01/01/2009
Field of study

Most spam filters include some automatic pattern classifiers based on machine learning and pattern recognition techniques. Such classifiers often require a large training set of labeled emails to attain a good discriminant capability between spam and legitimate emails. In addition, they must be frequently updated because of the changes introduced by spammers to their emails to evade spam filters. To address this issue active learning and semi-supervised learning techniques can be used. Many spam filters allow the user to give a feedback on personal emails automatically labeled during filter operation, and some filters include a self-training mechanism to exploit the large number of unlabeled emails collected during filter operation. However, users are usually willing to label only a few emails, and the benefits of selftraining techniques are limited. In this paper we propose an active semi-supervised learning method to better exploit unlabeled emails, which can be easily implemented as a plug-in in real spam filters. Our method is based on clustering unlabeled emails, querying the label of one email per cluster, and propagating such label to the most similar emails of the same cluster. The effectiveness of our method is evaluated using the well known open source SpamAssassin filter, on a large and publicly available corpus of real legitimate and spam emails. 1

CiteSeerX

Archivio istituzionale della ricerca - Università di Cagliari

Archivio istituzionale della ricerca - Università di Genova

Discovering a taste for the unusual: exceptional models for preference mining

Author: Alípio Mário Jorge
Arno Knobbe
Carlos Soares
Cláudio Rebelo de Sá
CR Sá de
CR Sá de
E Hüllermeier
F Chiclana
F M Harper
J Chomicki
L Umek
M Leeuwen van
N Jin
N Lavrac
P Brazdil
Paulo Azevedo
PJ Azevedo
V Svendová
W Duivesteijn
WD Cook
WD Cook
Wouter Duivesteijn
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Exceptional preferences mining (EPM) is a crossover between two subfields of data mining: local pattern mining and preference learning. EPM can be seen as a local pattern mining task that finds subsets of observations where some preference relations between labels significantly deviate from the norm. It is a variant of subgroup discovery, with rankings of labels as the target concept. We employ several quality measures that highlight subgroups featuring exceptional preferences, where the focus of what constitutes exceptional' varies with the quality measure: two measures look for exceptional overall ranking behavior, one measure indicates whether a particular label stands out from the rest, and a fourth measure highlights subgroups with unusual pairwise label ranking behavior. We explore a few datasets and compare with existing techniques. The results confirm that the new task EPM can deliver interesting knowledge.This research has received funding from the ECSEL Joint Undertaking, the framework programme for research and innovation Horizon 2020 (2014-2020) under Grant Agreement Number 662189-MANTIS-2014-1

Universidade do Minho: RepositoriUM

Repository TU/e

Crossref

Pure OAI Repository

Leiden University Scholary Publications

Learning Interpretable Rules for Multi-label Classification

Author: A Gabriel
AA Freitas
AJ Knobbe
B Liu
B Minnaert
D Malerba
E Gibaja
E Gibaja
E Loza Mencía
E Montañés
F Charte
F Herrera
F Janssen
F Thabtah
G Bosc
G Tsoumakas
Grigorios Tsoumakas
H Allahyari
J Arunadevi
J Demšar
J Fürnkranz
J Han
J Hipp
J Read
JN Sulzmann
K Dembczyński
K Dembczyński
L Chekina
L Raedt De
LE Sucar
M Atzmüller
M Beckerle
M Friedman
M Zhang
Miltiadis Allamanis
MR Boutell
P Kralj Novak
PJ Hayes
R Senge
RM Cameron-Jones
Shantanu Godbole
W Duivesteijn
W Waegeman
WW Cohen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2018
Field of study

Multi-label classification (MLC) is a supervised learning problem in which, contrary to standard multiclass classification, an instance can be associated with several class labels simultaneously. In this chapter, we advocate a rule-based approach to multi-label classification. Rule learning algorithms are often employed when one is not only interested in accurate predictions, but also requires an interpretable theory that can be understood, analyzed, and qualitatively evaluated by domain experts. Ideally, by revealing patterns and regularities contained in the data, a rule-based theory yields new insights in the application domain. Recently, several authors have started to investigate how rule-based models can be used for modeling multi-label data. Discussing this task in detail, we highlight some of the problems that make rule learning considerably more challenging for MLC than for conventional classification. While mainly focusing on our own previous work, we also provide a short overview of related work in this area.Comment: Preprint version. To appear in: Explainable and Interpretable Models in Computer Vision and Machine Learning. The Springer Series on Challenges in Machine Learning. Springer (2018). See http://www.ke.tu-darmstadt.de/bibtex/publications/show/3077 for further informatio

arXiv.org e-Print Archive

TUbiblio

Crossref

A service oriented architecture to provide data mining services for non-expert data miners

Author: García Saiz Diego
Zorrilla Pantaleón Marta E.
Publication venue: 'Elsevier BV'
Publication date: 01/04/2013
Field of study

In today's competitive market, companies need to use discovery knowledge techniques to make better, more informed decisions. But these techniques are out of the reach of most users as the knowledge discovery process requires an incredible amount of expertise. Additionally, business intelligence vendors are moving their systems to the cloud in order to provide services which offer companies cost-savings, better performance and faster access to new applications. This work joins both facets. It describes a data mining service addressed to non-expert data miners which can be delivered as Software-as-a-Service. Its main advantage is that by simply indicating where the data file is, the service itself is able to perform all the process. © 2012 Elsevier B.V. All rights reserved

Crossref

UCrea

Active learning and the Irish treebank

Author: Dras Mark
Foster Jennifer
Lynn Teresa
Uí Dhonnchadha Elaine
Publication venue
Publication date: 01/01/2012
Field of study

We report on our ongoing work in developing the Irish Dependency Treebank, describe the results of two Inter annotator Agreement (IAA) studies, demonstrate improvements in annotation consistency which have a knock-on effect on parsing accuracy, and present the final set of dependency labels. We then go on to investigate the extent to which active learning can play a role in treebank and parser development by comparing an active learning bootstrapping approach to a passive approach in which sentences are chosen at random for manual revision. We show that active learning outperforms passive learning, but when annotation effort is taken into account, it is not clear how much of an advantage the active learning approach has. Finally, we present results which suggest that adding automatic parses to the training data along with manually revised parses in an active learning setup does not greatly affect parsing accuracy

CiteSeerX

Irish Universities

DCU Online Research Access Service

Macquarie University ResearchOnline