Search CORE

5 research outputs found

Using Information Filtering in Web Data Mining Process

Author: Bruza Peter
Lau Raymond
Li Yuefeng
Wu Shengtang
Xu Yue
Zhou Xujuan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Web service-oriented Grid is becoming a standard for achieving loosely coupled distributed computing. Grid services could easily be specified with web-service based interfaces. In this paper we first envisage a realistic Grid market with players such as end-users, brokers and service providers participating co-operatively with an aim to meet requirements and earn profit. End-users wish to use functionality of Grid services by paying the minimum possible price or price confined within a specified budget, brokers aim to maximise profit whilst establishing a SLA (Service Level Agreement) and satisfying end-user needs and at the same time resisting the volatility of service execution time and availability. Service providers aim to develop price models based on end-user or broker demands that will maximise their profit. In this paper we focus on developing stochastic approaches to end-user workflow scheduling that provides QoS guarantees by establishing a SLA. We also develop a novel 2-stage stochastic programming technique that aims at establishing a SLA with end-users regarding satisfying their workflow QoS requirements. We develop a scheduling (workload allocation) technique based on linear programming that embeds the negotiated workflow QoS into the program and model Grid services as generalised queues. This technique is shown to outperform existing scheduling techniques that don't rely on real-time performance information

Crossref

Queensland University of Technology ePrints Archive

Macquarie University ResearchOnline

University of Southern Queensland ePrints

Concept learning of text documents

Author: An Jiyuan
Chen Yi-Ping Phoebe
Publication venue: IEEE Xplore
Publication date: 01/01/2004
Field of study

Concept learning of text documents can be viewed as the problem of acquiring the definition of a general category of documents. To definite the category of a text document, the Conjunctive of keywords is usually be used. These keywords should be fewer and comprehensible. A naïve method is enumerating all combinations of keywords to extract suitable ones. However, because of the enormous number of keyword combinations, it is impossible to extract the most relevant keywords to describe the categories of documents by enumerating all possible combinations of keywords. Many heuristic methods are proposed, such as GA-base, immune based algorithm. In this work, we introduce pruning power technique and propose a robust enumeration-based concept learning algorithm. Experimental results show that the rules produce by our approach has more comprehensible and simplicity than by other methods. <br /

Deakin Research Online

Keyword extraction for text categorization

Author: An Jiyuan
Chen Yi-Ping Phoebe
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Text categorization (TC) is one of the main applications of machine learning. Many methods have been proposed, such as Rocchio method, Naive bayes based method, and SVM based text classification method. These methods learn labeled text documents and then construct a classifier. A new coming text document\u27s category can be predicted. However, these methods do not give the description of each category. In the machine learning field, there are many concept learning algorithms, such as, ID3 and CN2. This paper proposes a more robust algorithm to induce concepts from training examples, which is based on enumeration of all possible keywords combinations. Experimental results show that the rules produced by our approach have more precision and simplicity than that of other methods.<br /

Deakin Research Online

Concept Learning of Text Documents

Author: Jiyuan An
Yi-Ping Phoebe Chen
Publication venue
Publication date: 05/03/2020
Field of study

Abstrac

CiteSeerX

Cooperative strategy for web data mining and cleaning

Author: Li Yuefeng
Zhang Chengqi
Zhang Shichao
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2003
Field of study

While the Internet and World Wide Web have put a huge volume of low-quality information at the easy access of an information gathering system, filtering out irrelevant information has become a big challenge. In this paper, a Web data mining and cleaning strategy for information gathering is proposed. A data-mining model is presented for the data that come from multiple agents. Using the model, a data-cleaning algorithm is then presented to eliminate irrelevant data. To evaluate the data-cleaning strategy, an interpretation is given for the mining model according to evidence theory. An experiment is also conducted to evaluate the strategy using Web data. The experimental results have shown that the proposed strategy is efficient and promising

OPUS - University of Technology Sydney

Queensland University of Technology ePrints Archive