22,813 research outputs found
Effective pattern discovery for text mining
Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase) based approaches should perform better than the term-based ones, but many experiments did not support this hypothesis. This paper presents an innovative technique, effective pattern discovery which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. Substantial experiments on RCV1 data collection and TREC topics demonstrate that the proposed solution achieves encouraging performance
A review of associative classification mining
Associative classification mining is a promising approach in data mining that utilizes the
association rule discovery techniques to construct classification systems, also known as
associative classifiers. In the last few years, a number of associative classification algorithms
have been proposed, i.e. CPAR, CMAR, MCAR, MMAC and others. These algorithms
employ several different rule discovery, rule ranking, rule pruning, rule prediction and rule
evaluation methods. This paper focuses on surveying and comparing the state-of-the-art associative
classification techniques with regards to the above criteria. Finally, future directions in associative
classification, such as incremental learning and mining low-quality data sets, are also
highlighted in this paper
Interpretable multiclass classification by MDL-based rule lists
Interpretable classifiers have recently witnessed an increase in attention
from the data mining community because they are inherently easier to understand
and explain than their more complex counterparts. Examples of interpretable
classification models include decision trees, rule sets, and rule lists.
Learning such models often involves optimizing hyperparameters, which typically
requires substantial amounts of data and may result in relatively large models.
In this paper, we consider the problem of learning compact yet accurate
probabilistic rule lists for multiclass classification. Specifically, we
propose a novel formalization based on probabilistic rule lists and the minimum
description length (MDL) principle. This results in virtually parameter-free
model selection that naturally allows to trade-off model complexity with
goodness of fit, by which overfitting and the need for hyperparameter tuning
are effectively avoided. Finally, we introduce the Classy algorithm, which
greedily finds rule lists according to the proposed criterion. We empirically
demonstrate that Classy selects small probabilistic rule lists that outperform
state-of-the-art classifiers when it comes to the combination of predictive
performance and interpretability. We show that Classy is insensitive to its
only parameter, i.e., the candidate set, and that compression on the training
set correlates with classification performance, validating our MDL-based
selection criterion
Quantitative Redundancy in Partial Implications
We survey the different properties of an intuitive notion of redundancy, as a
function of the precise semantics given to the notion of partial implication.
The final version of this survey will appear in the Proceedings of the Int.
Conf. Formal Concept Analysis, 2015.Comment: Int. Conf. Formal Concept Analysis, 201
Mining Traversal Patterns from Weighted Traversals and Graph
์ค์ธ๊ณ์ ๋ง์ ๋ฌธ์ ๋ค์ ๊ทธ๋ํ์ ๊ทธ ๊ทธ๋ํ๋ฅผ ์ํํ๋ ํธ๋์ญ์
์ผ๋ก ๋ชจ๋ธ๋ง๋ ์ ์๋ค. ์๋ฅผ ๋ค๋ฉด, ์น ํ์ด์ง์ ์ฐ๊ฒฐ๊ตฌ์กฐ๋ ๊ทธ๋ํ๋ก ํํ๋ ์ ์๊ณ , ์ฌ์ฉ์์ ์น ํ์ด์ง ๋ฐฉ๋ฌธ๊ฒฝ๋ก๋ ๊ทธ ๊ทธ๋ํ๋ฅผ ์ํํ๋ ํธ๋์ญ์
์ผ๋ก ๋ชจ๋ธ๋ง๋ ์ ์๋ค. ์ด์ ๊ฐ์ด ๊ทธ๋ํ๋ฅผ ์ํํ๋ ํธ๋์ญ์
์ผ๋ก๋ถํฐ ์ค์ํ๊ณ ๊ฐ์น ์๋ ํจํด์ ์ฐพ์๋ด๋ ๊ฒ์ ์๋ฏธ ์๋ ์ผ์ด๋ค. ์ด๋ฌํ ํจํด์ ์ฐพ๊ธฐ ์ํ ์ง๊ธ๊น์ง์ ์ฐ๊ตฌ์์๋ ์ํ๋ ๊ทธ๋ํ์ ๊ฐ์ค์น๋ฅผ ๊ณ ๋ คํ์ง ์๊ณ ๋จ์ํ ๋น๋ฐํ๋ ํจํด๋ง์ ์ฐพ๋ ์๊ณ ๋ฆฌ์ฆ์ ์ ์ํ์๋ค. ์ด๋ฌํ ์๊ณ ๋ฆฌ์ฆ์ ํ๊ณ๋ ๋ณด๋ค ์ ๋ขฐ์ฑ ์๊ณ ์ ํํ ํจํด์ ํ์ฌํ๋ ๋ฐ ์ด๋ ค์์ด ์๋ค๋ ๊ฒ์ด๋ค.
๋ณธ ๋
ผ๋ฌธ์์๋ ์ํ๋ ๊ทธ๋ํ์ ์ ์ ์ ๋ถ์ฌ๋ ๊ฐ์ค์น๋ฅผ ๊ณ ๋ คํ์ฌ ํจํด์ ํ์ฌํ๋ ๋ ๊ฐ์ง ๋ฐฉ๋ฒ๋ค์ ์ ์ํ๋ค. ์ฒซ ๋ฒ์งธ ๋ฐฉ๋ฒ์ ๊ทธ๋ํ๋ฅผ ์ํํ๋ ์ ๋ณด์ ๊ฐ์ค์น๊ฐ ์กด์ฌํ๋ ๊ฒฝ์ฐ์ ๋น๋ฐ ์ํ ํจํด์ ํ์ฌํ๋ ๊ฒ์ด๋ค. ๊ทธ๋ํ ์ํ์ ๋ถ์ฌ๋ ์ ์๋ ๊ฐ์ค์น๋ก๋ ๋ ๋์๊ฐ์ ์ด๋ ์๊ฐ์ด๋ ์น ์ฌ์ดํธ๋ฅผ ๋ฐฉ๋ฌธํ ๋ ํ ํ์ด์ง์์ ๋ค๋ฅธ ํ์ด์ง๋ก ์ด๋ํ๋ ์๊ฐ ๋ฑ์ด ๋ ์ ์๋ค. ๋ณธ ๋
ผ๋ฌธ์์๋ ์ข ๋ ์ ํํ ์ํ ํจํด์ ๋ง์ด๋ํ๊ธฐ ์ํด ํต๊ณํ์ ์ ๋ขฐ ๊ตฌ๊ฐ์ ์ด์ฉํ๋ค. ์ฆ, ์ ์ฒด ์ํ์ ๊ฐ ๊ฐ์ ์ ๋ถ์ฌ๋ ๊ฐ์ค์น๋ก๋ถํฐ ์ ๋ขฐ ๊ตฌ๊ฐ์ ๊ตฌํ ํ ์ ๋ขฐ ๊ตฌ๊ฐ์ ๋ด์ ์๋ ์ํ๋ง์ ์ ํจํ ๊ฒ์ผ๋ก ์ธ์ ํ๋ ๋ฐฉ๋ฒ์ด๋ค. ์ด๋ฌํ ๋ฐฉ๋ฒ์ ์ ์ฉํจ์ผ๋ก์จ ๋์ฑ ์ ๋ขฐ์ฑ ์๋ ์ํ ํจํด์ ๋ง์ด๋ํ ์ ์๋ค. ๋ํ ์ด๋ ๊ฒ ๊ตฌํ ํจํด๊ณผ ๊ทธ๋ํ ์ ๋ณด๋ฅผ ์ด์ฉํ์ฌ ํจํด ๊ฐ์ ์ฐ์ ์์๋ฅผ ๊ฒฐ์ ํ ์ ์๋ ๋ฐฉ๋ฒ๊ณผ ์ฑ๋ฅ ํฅ์์ ์ํ ์๊ณ ๋ฆฌ์ฆ๋ ์ ์ํ๋ค.
๋ ๋ฒ์งธ ๋ฐฉ๋ฒ์ ๊ทธ๋ํ์ ์ ์ ์ ๊ฐ์ค์น๊ฐ ๋ถ์ฌ๋ ๊ฒฝ์ฐ์ ๊ฐ์ค์น๊ฐ ๊ณ ๋ ค๋ ๋น๋ฐ ์ํ ํจํด์ ํ์ฌํ๋ ๋ฐฉ๋ฒ์ด๋ค. ๊ทธ๋ํ์ ์ ์ ์ ๋ถ์ฌ๋ ์ ์๋ ๊ฐ์ค์น๋ก๋ ์น ์ฌ์ดํธ ๋ด์ ๊ฐ ๋ฌธ์์ ์ ๋ณด๋์ด๋ ์ค์๋ ๋ฑ์ด ๋ ์ ์๋ค. ์ด ๋ฌธ์ ์์๋ ๋น๋ฐ ์ํ ํจํด์ ๊ฒฐ์ ํ๊ธฐ ์ํ์ฌ ํจํด์ ๋ฐ์ ๋น๋๋ฟ๋ง ์๋๋ผ ๋ฐฉ๋ฌธํ ์ ์ ์ ๊ฐ์ค์น๋ฅผ ๋์์ ๊ณ ๋ คํ์ฌ์ผ ํ๋ค. ์ด๋ฅผ ์ํด ๋ณธ ๋
ผ๋ฌธ์์๋ ์ ์ ์ ๊ฐ์ค์น๋ฅผ ์ด์ฉํ์ฌ ํฅํ์ ๋น๋ฐ ํจํด์ด ๋ ๊ฐ๋ฅ์ฑ์ด ์๋ ํ๋ณด ํจํด์ ๊ฐ ๋ง์ด๋ ๋จ๊ณ์์ ์ ๊ฑฐํ์ง ์๊ณ ์ ์งํ๋ ์๊ณ ๋ฆฌ์ฆ์ ์ ์ํ๋ค. ๋ํ ์ฑ๋ฅ ํฅ์์ ์ํด ํ๋ณด ํจํด์ ์๋ฅผ ๊ฐ์์ํค๋ ์๊ณ ๋ฆฌ์ฆ๋ ์ ์ํ๋ค.
๋ณธ ๋
ผ๋ฌธ์์ ์ ์ํ ๋ ๊ฐ์ง ๋ฐฉ๋ฒ์ ๋ํ์ฌ ๋ค์ํ ์คํ์ ํตํ์ฌ ์ํ ์๊ฐ ๋ฐ ์์ฑ๋๋ ํจํด์ ์ ๋ฑ์ ๋น๊ต ๋ถ์ํ์๋ค.
๋ณธ ๋
ผ๋ฌธ์์๋ ์ํ์ ๊ฐ์ค์น๊ฐ ์๋ ๊ฒฝ์ฐ์ ๊ทธ๋ํ์ ์ ์ ์ ๊ฐ์ค์น๊ฐ ์๋ ๊ฒฝ์ฐ์ ๋น๋ฐ ์ํ ํจํด์ ํ์ฌํ๋ ์๋ก์ด ๋ฐฉ๋ฒ๋ค์ ์ ์ํ์๋ค. ์ ์ํ ๋ฐฉ๋ฒ๋ค์ ์น ๋ง์ด๋๊ณผ ๊ฐ์ ๋ถ์ผ์ ์ ์ฉํจ์ผ๋ก์จ ์น ๊ตฌ์กฐ์ ํจ์จ์ ์ธ ๋ณ๊ฒฝ์ด๋ ์น ๋ฌธ์์ ์ ๊ทผ ์๋ ํฅ์, ์ฌ์ฉ์๋ณ ๊ฐ์ธํ๋ ์น ๋ฌธ์ ๊ตฌ์ถ ๋ฑ์ด ๊ฐ๋ฅํ ๊ฒ์ด๋ค.Abstract โ
ถ
Chapter 1 Introduction
1.1 Overview
1.2 Motivations
1.3 Approach
1.4 Organization of Thesis
Chapter 2 Related Works
2.1 Itemset Mining
2.2 Weighted Itemset Mining
2.3 Traversal Mining
2.4 Graph Traversal Mining
Chapter 3 Mining Patterns from Weighted Traversals on
Unweighted Graph
3.1 Definitions and Problem Statements
3.2 Mining Frequent Patterns
3.2.1 Augmentation of Base Graph
3.2.2 In-Mining Algorithm
3.2.3 Pre-Mining Algorithm
3.2.4 Priority of Patterns
3.3 Experimental Results
Chapter 4 Mining Patterns from Unweighted Traversals on
Weighted Graph
4.1 Definitions and Problem Statements
4.2 Mining Weighted Frequent Patterns
4.2.1 Pruning by Support Bounds
4.2.2 Candidate Generation
4.2.3 Mining Algorithm
4.3 Estimation of Support Bounds
4.3.1 Estimation by All Vertices
4.3.2 Estimation by Reachable Vertices
4.4 Experimental Results
Chapter 5 Conclusions and Further Works
Reference
Learning from Ontology Streams with Semantic Concept Drift
Data stream learning has been largely studied for extracting knowledge
structures from continuous and rapid data records. In the semantic Web, data is
interpreted in ontologies and its ordered sequence is represented as an
ontology stream. Our work exploits the semantics of such streams to tackle the
problem of concept drift i.e., unexpected changes in data distribution, causing
most of models to be less accurate as time passes. To this end we revisited (i)
semantic inference in the context of supervised stream learning, and (ii)
models with semantic embeddings. The experiments show accurate prediction with
data from Dublin and Beijing
- โฆ