48,697 research outputs found
Using Information Filtering in Web Data Mining Process
Web service-oriented Grid is becoming a standard for achieving loosely coupled distributed computing. Grid services could easily be specified with web-service based interfaces. In this paper we first envisage a realistic Grid market with players such as end-users, brokers and service providers participating co-operatively with an aim to meet requirements and earn profit. End-users wish to use functionality of Grid services by paying the minimum possible price or price confined within a specified budget, brokers aim to maximise profit whilst establishing a SLA (Service Level Agreement) and satisfying end-user needs and at the same time resisting the volatility of service execution time and availability. Service providers aim to develop price models based on end-user or broker demands that will maximise their profit. In this paper we focus on developing stochastic approaches to end-user workflow scheduling that provides QoS guarantees by establishing a SLA. We also develop a novel 2-stage stochastic programming technique that aims at establishing a SLA with end-users regarding satisfying their workflow QoS requirements. We develop a scheduling (workload allocation) technique based on linear programming that embeds the negotiated workflow QoS into the program and model Grid services as generalised queues. This technique is shown to outperform existing scheduling techniques that don't rely on real-time performance information
Taming Wild High Dimensional Text Data with a Fuzzy Lash
The bag of words (BOW) represents a corpus in a matrix whose elements are the
frequency of words. However, each row in the matrix is a very high-dimensional
sparse vector. Dimension reduction (DR) is a popular method to address sparsity
and high-dimensionality issues. Among different strategies to develop DR
method, Unsupervised Feature Transformation (UFT) is a popular strategy to map
all words on a new basis to represent BOW. The recent increase of text data and
its challenges imply that DR area still needs new perspectives. Although a wide
range of methods based on the UFT strategy has been developed, the fuzzy
approach has not been considered for DR based on this strategy. This research
investigates the application of fuzzy clustering as a DR method based on the
UFT strategy to collapse BOW matrix to provide a lower-dimensional
representation of documents instead of the words in a corpus. The quantitative
evaluation shows that fuzzy clustering produces superior performance and
features to Principal Components Analysis (PCA) and Singular Value
Decomposition (SVD), two popular DR methods based on the UFT strategy
Class Association Rules Mining based Rough Set Method
This paper investigates the mining of class association rules with rough set
approach. In data mining, an association occurs between two set of elements
when one element set happen together with another. A class association rule set
(CARs) is a subset of association rules with classes specified as their
consequences. We present an efficient algorithm for mining the finest class
rule set inspired form Apriori algorithm, where the support and confidence are
computed based on the elementary set of lower approximation included in the
property of rough set theory. Our proposed approach has been shown very
effective, where the rough set approach for class association discovery is much
simpler than the classic association method.Comment: 10 pages, 2 figure
Effective pattern discovery for text mining
Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase) based approaches should perform better than the term-based ones, but many experiments did not support this hypothesis. This paper presents an innovative technique, effective pattern discovery which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. Substantial experiments on RCV1 data collection and TREC topics demonstrate that the proposed solution achieves encouraging performance
Datamining for Web-Enabled Electronic Business Applications
Web-Enabled Electronic Business is generating massive amount of data on customer purchases, browsing patterns, usage times and preferences at an increasing rate. Data mining techniques can be applied to all the data being collected for obtaining useful information. This chapter attempts to present issues associated with data mining for web-enabled electronic-business
A comparative study of the AHP and TOPSIS methods for implementing load shedding scheme in a pulp mill system
The advancement of technology had encouraged mankind to design and create useful
equipment and devices. These equipment enable users to fully utilize them in various
applications. Pulp mill is one of the heavy industries that consumes large amount of
electricity in its production. Due to this, any malfunction of the equipment might
cause mass losses to the company. In particular, the breakdown of the generator
would cause other generators to be overloaded. In the meantime, the subsequence
loads will be shed until the generators are sufficient to provide the power to other
loads. Once the fault had been fixed, the load shedding scheme can be deactivated.
Thus, load shedding scheme is the best way in handling such condition. Selected load
will be shed under this scheme in order to protect the generators from being
damaged. Multi Criteria Decision Making (MCDM) can be applied in determination
of the load shedding scheme in the electric power system. In this thesis two methods
which are Analytic Hierarchy Process (AHP) and Technique for Order Preference by
Similarity to Ideal Solution (TOPSIS) were introduced and applied. From this thesis,
a series of analyses are conducted and the results are determined. Among these two
methods which are AHP and TOPSIS, the results shown that TOPSIS is the best
Multi criteria Decision Making (MCDM) for load shedding scheme in the pulp mill
system. TOPSIS is the most effective solution because of the highest percentage
effectiveness of load shedding between these two methods. The results of the AHP
and TOPSIS analysis to the pulp mill system are very promising
- …