Search CORE

11,145 research outputs found

Effective pattern discovery for text mining

Author: Li Yuefeng
Wu Sheng-Tang
Zhong Ning
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase) based approaches should perform better than the term-based ones, but many experiments did not support this hypothesis. This paper presents an innovative technique, effective pattern discovery which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. Substantial experiments on RCV1 data collection and TREC topics demonstrate that the proposed solution achieves encouraging performance

Queensland University of Technology ePrints Archive

Problem-Solving Knowledge Mining from Users’\ud Actions in an Intelligent Tutoring System

Author: Couturier Olivier
Fournier-Viger Philippe
Mephu Engelbert
Nkambou Roger
Publication venue: Springer-Verlag
Publication date: 01/05/2007
Field of study

In an intelligent tutoring system (ITS), the domain expert should provide\ud relevant domain knowledge to the tutor so that it will be able to guide the\ud learner during problem solving. However, in several domains, this knowledge is\ud not predetermined and should be captured or learned from expert users as well as\ud intermediate and novice users. Our hypothesis is that, knowledge discovery (KD)\ud techniques can help to build this domain intelligence in ITS. This paper proposes\ud a framework to capture problem-solving knowledge using a promising approach\ud of data and knowledge discovery based on a combination of sequential pattern\ud mining and association rules discovery techniques. The framework has been implemented\ud and is used to discover new meta knowledge and rules in a given domain\ud which then extend domain knowledge and serve as problem space allowing\ud the intelligent tutoring system to guide learners in problem-solving situations.\ud Preliminary experiments have been conducted using the framework as an alternative\ud to a path-planning problem solver in CanadarmTutor

Archipel - Université du Québec à Montréal

Towards Efficient Sequential Pattern Mining in Temporal Uncertain Databases

Author: C-K Chui
C-K Chui
CC Aggarwal
H Zhang
J Jestes
M Muzammal
Y Tong
Z Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Uncertain sequence databases are widely used to model data with inaccurate or imprecise timestamps in many real world applications. In this paper, we use uniform distributions to model uncertain timestamps and adopt possible world semantics to interpret temporal uncertain database. We design an incremental approach to manage temporal uncertainty efficiently, which is integrated into the classic pattern-growth SPM algorithm to mine uncertain sequential patterns. Extensive experiments prove that our algorithm performs well in both efficiency and scalability

Crossref

IUPUIScholarWorks

A pattern mining approach for information filtering systems

Author: Algarni Abdulmohsen
Li Yuefeng
Xu Yue
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

It is a big challenge to clearly identify the boundary between positive and negative streams for information filtering systems. Several attempts have used negative feedback to solve this challenge; however, there are two issues for using negative relevance feedback to improve the effectiveness of information filtering. The first one is how to select constructive negative samples in order to reduce the space of negative documents. The second issue is how to decide noisy extracted features that should be updated based on the selected negative samples. This paper proposes a pattern mining based approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features. It also classifies extracted features (i.e., terms) into three categories: positive specific terms, general terms, and negative specific terms. In this way, multiple revising strategies can be used to update extracted features. An iterative learning algorithm is also proposed to implement this approach on the RCV1 data collection, and substantial experiments show that the proposed approach achieves encouraging performance and the performance is also consistent for adaptive filtering as well

Queensland University of Technology ePrints Archive

An Efficient Approach of Discovery of Frequent Data Set from Big Operational Database

Author: Priyanka V. Gajare, Mrs. Sonali Patil
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/07/2015
Field of study

Currently in real world scenario data uncertainty is the most major issue in the real time applications where these data are generated from various devices daily from various users. So, the important part is to find the important data from them. In this paper, we propose to measure pattern frequentness based on the various possible world semantics. We are looking to establish two uncertain sequence data models abstracted from many real-life applications involving uncertain sequence data, and based on that formulate the problem of mining probabilistically frequent sequential patterns (or p-FSPs) from data that conform to our models. By using the projection strategy of famous prefixspan algorithm, we are looking to develop an algorithm called U-PrefixSpan for probabilistically frequent sequential pattern mining. UPrefixSpan avoids the problem of “possible world explosion” and when combined with pruning techniques and one validating technique achieves good performance. Theoretically study and analysis shows that our work proposed do the better with compare to existing system. DOI: 10.17762/ijritcc2321-8169.15078

International Journal on Recent and Innovation Trends in Computing and Communication

Mining High Utility Sequential Patterns from Uncertain Web Access Sequences using the PL-WAP

Author: Vangala Sravya
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2017
Field of study

In general, the web access patterns are retrieved from the web access sequence databases using various sequential pattern algorithms such as GSP, WAP, and PLWAP tree. However, these algorithms do not consider sequential data with quantity (internal utility) (e.g., the amount of the time spent by the user on a web page) and quality (external utility) (e.g., the rating of a web page in a website) information. These algorithms also do not work on uncertain sequential items (e.g., purchased products) having probability (0, 1). Factoring in the utility and uncertainty of each sequence item provides more product information that can be beneficial in mining profitable patterns from company’s websites. For example, a customer can purchase a bottle of ink more frequently than a printer but the purchase of a single printer can yield more profit to the business owner than the purchase of multiple bottles of ink. Most existing traditional uncertain sequential pattern algorithms such as U-Apriori, UF-Growth, and U-PLWAP do not include the utility measures. In U-PLWAP, the web sequences are derived from web log data without including the time spent by the user and the web pages are not associated with any rating. By considering these two utilities, sometimes the items with lower existential probability can be more profitable to the website owner. In utility based traditional algorithms, the only algorithm related to both uncertain and high utility is the PHUI-UP algorithm which considers the probability and utility as different entities and the retrieved patterns are not dependent with both due to two different thresholds, and it does not mine uncertain web access database sequences. This thesis proposes the algorithm HUU-PLWAP miner for mining uncertain sequential patterns with internal and external utility information using PLWAP tree approach that cut down on several database scans of level-wise approaches. HUU-PLWAP uses uncertain internal utility values (derived from sequence uncertainty model) and the constant external utility values (predefined) to retrieve the high utility sequential patterns from uncertain web access sequence databases with the help of U-PLWAP methodology. Experiments show that HUU-PLWAP is at least 95% faster than U-PLWAP, and 75% faster than the PHUI-UP algorithm

Scholarship at UWindsor