The volume of information available on the Internet and corporate intranets continues to increase along
with the corresponding increase in the data (structured and unstructured) stored by many organizations.
Over the past years, data mining techniques have been used to explore large volume of data (structured) in
order to discover knowledge, often in form of a decision support system. For effective decision making,
there is need to discover knowledge from both structured and unstructured data for completeness and
comprehensiveness.
The aim of this paper is to present a framework to discover this kind of knowledge and to present a report
on the work-in-progress on an on going research work. The proposed framework is composed of three basic
phases: extraction and integration, data mining and finally the relevance of such a system to the business
decision support system. In the first phase, both the structured and unstructured data are combined to form
an XML database (combined data warehouse (CDW)). Efficiency is enhanced by clustering of unstructured
data (documents) using SOM (Self Organized Maps) clustering algorithm, extracting keyphrases based on
training and TF/IDF (Term Frequency/Inverse Document Frequency) by using the KEA (Keyphrases
Extraction Algorithm) toolkit. In the second phase, association rule mining technique is applied to discover
knowledge from the combined data warehouse. The final phase reflects the changes that such a system will
bring about to the marketing decision support system.
The paper also describes a developed system which evaluates the association rules mined from structured
data that forms the first phase of the research work.
The proposed system is expected to improve the quality of decisions, and this will be evaluated by using
standard metrics for evaluating the interestingness of association rule which is based on statistical
independence and correlation analysis