31 research outputs found
Compiler and runtime support for shared memory parallelization of data mining algorithms
Abstract. Data mining techniques focus on finding novel and useful patterns or models from large datasets. Because of the volume of the data to be analyzed, the amount of computation involved, and the need for rapid or even interactive analysis, data mining applications require the use of parallel machines. We have been developing compiler and runtime support for developing scalable implementations of data mining algorithms. Our work encompasses shared memory parallelization, distributed memory parallelization, and optimizations for processing disk-resident datasets. In this paper, we focus on compiler and runtime support for shared memory parallelization of data mining algorithms. We have developed a set of parallelization techniques that apply across algorithms for a variety of mining tasks. We describe the interface of the middleware where these techniques are implemented. Then, we present compiler techniques for translating data parallel code to the middleware specification. Finally, we present a brief evaluation of our compiler using apriori association mining and k-means clustering.
EXPLOITING HIGHER ORDER UNCERTAINTY IN IMAGE ANALYSIS
Soft computing is a group of methodologies that works synergistically to provide flexible information processing capability for handling real-life ambiguous situations. Its aim is to exploit the tolerance
for imprecision, uncertainty, approximate reasoning, and partial truth in order to achieve tractability, robustness, and low-cost solutions. Soft computing methodologies (involving fuzzy sets, neural networks, genetic algorithms, and rough sets) have been successfully employed in various image processing tasks including image segmentation, enhancement and classification, both individually or in combination with other soft computing techniques. The reason of such success has its motivation in the fact that soft computing techniques provide a powerful tools to describe uncertainty, naturally embedded in images, which can be exploited in various image processing tasks. The main contribution of this thesis is to present tools for handling uncertainty by means of a rough-fuzzy framework for exploiting feature
level uncertainty. The first contribution is the definition of a general framework based
on the hybridization of rough and fuzzy sets, along with a new operator called RF-product, as an effective solution to some problems in image analysis. The second and third contributions are devoted to prove the effectiveness of the proposed framework, by presenting a compression method based on vector quantization and its compression
capabilities and an HSV color image segmentation technique
Front Matter - Soft Computing for Data Mining Applications
Efficient tools and algorithms for knowledge discovery in large data sets have been devised during the recent years. These methods exploit the capability of computers to search huge amounts of data in a fast and effective manner. However, the data to be analyzed is imprecise and afflicted with uncertainty. In the case of heterogeneous data sources such as text, audio and video, the data might moreover be ambiguous and partly conflicting. Besides, patterns and relationships of interest are usually vague and approximate. Thus, in order to make the information mining process more robust or say, human-like methods for searching and learning it requires tolerance towards imprecision, uncertainty and exceptions. Thus, they have approximate reasoning capabilities and are capable of handling partial truth. Properties of the aforementioned kind are typical soft computing. Soft computing techniques like Genetic
Criteria of Empirical Significance: Foundations, Relations, Applications
This dissertation consists of three parts. Part I is a defense of an artificial language methodology in philosophy and a historical and systematic defense of the logical empiricists' application of an artificial language methodology to scientific theories. These defenses provide a justification for the presumptions of a host of criteria of empirical significance, which I analyze, compare, and develop in part II. On the basis of this analysis, in part III I use a variety of criteria to evaluate the scientific status of intelligent design, and further discuss confirmation, reduction, and concept formation
Knowledge discovery for moderating collaborative projects
In today's global market environment, enterprises are increasingly turning towards
collaboration in projects to leverage their resources, skills and expertise, and
simultaneously address the challenges posed in diverse and competitive markets.
Moderators, which are knowledge based systems have successfully been used to support
collaborative teams by raising awareness of problems or conflicts. However, the
functioning of a moderator is limited to the knowledge it has about the team members.
Knowledge acquisition, learning and updating of knowledge are the major challenges for
a Moderator's implementation. To address these challenges a Knowledge discOvery And
daTa minINg inteGrated (KOATING) framework is presented for Moderators to enable them to continuously learn from the operational databases of the company and semi-automatically update the corresponding expert module. The architecture for the Universal Knowledge Moderator (UKM) shows how the existing moderators can be extended to support global manufacturing.
A method for designing and developing the knowledge acquisition module of the Moderator for manual and semi-automatic update of knowledge is documented using the Unified Modelling Language (UML). UML has been used to explore the static structure and dynamic behaviour, and describe the system analysis, system design and system
development aspects of the proposed KOATING framework. The proof of design has been presented using a case study for a collaborative project in
the form of construction project supply chain. It has been shown that Moderators can
"learn" by extracting various kinds of knowledge from Post Project Reports (PPRs) using
different types of text mining techniques. Furthermore, it also proposed that the
knowledge discovery integrated moderators can be used to support and enhance
collaboration by identifying appropriate business opportunities and identifying
corresponding partners for creation of a virtual organization. A case study is presented in
the context of a UK based SME. Finally, this thesis concludes by summarizing the thesis,
outlining its novelties and contributions, and recommending future research
Criteria of Empirical Significance: Foundations, Relations, Applications
This dissertation consists of three parts. Part I is a defense of an artificial language methodology in philosophy and a historical and systematic defense of the logical empiricists' application of an artificial language methodology to scientific theories. These defenses provide a justification for the presumptions of a host of criteria of empirical significance, which I analyze, compare, and develop in part II. On the basis of this analysis, in part III I use a variety of criteria to evaluate the scientific status of intelligent design, and further discuss confirmation, reduction, and concept formation