10 research outputs found

    Intelligent Fusion of Structural and Citation-Based Evidence for Text Classification

    Get PDF
    This paper investigates how citation-based information and structural content (e.g., title, abstract) can be combined to improve classification of text documents into predefined categories. We evaluate different measures of similarity, five derived from the citation structure of the collection, and three measures derived from the structural content, and determine how they can be fused to improve classification effectiveness. To discover the best fusion framework, we apply Genetic Programming (GP) techniques. Our empirical experiments using documents from the ACM digital library and the ACM classification scheme show that we can discover similarity functions that work better than any evidence in isolation and whose combined performance through a simple majority voting is comparable to that of Support Vector Machine classifiers

    Clustering Blog Information

    Get PDF
    Blogs form an important source of information in today’s internet world. There are blogs on different topics such as technical, health, electronic gadgets, shopping, etc. However, most of the blog websites have the blogs arranged in chronological order rather than its contents. Such arrangement of blogs makes it difficult for the user searching information about a particular topic from the blog. To resolve this problem, we propose an approach to cluster the blogs based on its content. We studied several clustering algorithms available. The objective of this report is to understand various steps involved in clustering blog information and working of clustering algorithms which best fits in text clustering and improving clustering results by reducing its drawbacks. The report demonstrates a comparison between the K-Means, Vector Space Model (VSM), Latent Semantic Indexing (LSI), and Fuzzy C-Means (FCM) clustering algorithms discussed through the paper and selects the optimum algorithm for blog clustering. The paper proposes modification of selected optimum algorithm to get required blog clustering. A comparison table of the selected algorithm and modified algorithm is included at the end of the paper and the results showed the proposed modified algorithm has performed overall better compared to other clustering algorithms

    TEXTUAL DATA MINING FOR NEXT GENERATION INTELLIGENT DECISION MAKING IN INDUSTRIAL ENVIRONMENT: A SURVEY

    Get PDF
    This paper proposes textual data mining as a next generation intelligent decision making technology for sustainable knowledge management solutions in any industrial environment. A detailed survey of applications of Data Mining techniques for exploiting information from different data formats and transforming this information into knowledge is presented in the literature survey. The focus of the survey is to show the power of different data mining techniques for exploiting information from data. The literature surveyed in this paper shows that intelligent decision making is of great importance in many contexts within manufacturing, construction and business generally. Business intelligence tools, which can be interpreted as decision support tools, are of increasing importance to companies for their success within competitive global markets. However, these tools are dependent on the relevancy, accuracy and overall quality of the knowledge on which they are based and which they use. Thus the research work presented in the paper uncover the importance and power of different data mining techniques supported by text mining methods used to exploit information from semi-structured or un-structured data formats. A great source of information is available in these formats and when exploited by combined efforts of data and text mining tools help the decision maker to take effective decision for the enhancement of business of industry and discovery of useful knowledge is made for next generation of intelligent decision making. Thus the survey shows the power of textual data mining as the next generation technology for intelligent decision making in the industrial environment

    Learning to rank

    Get PDF
    Abstract. New general purpose ranking functions are discovered using genetic programming. The TREC WSJ collection was chosen as a training set. A baseline comparison function was chosen as the best of inner product, probability, cosine, and Okapi BM25. An elitist genetic algorithm with a population size 100 was run 13 times for 100 generations and the best performing algorithms chosen from these. The best learned functions, when evaluated against the best baseline function (BM25), demonstrate some significant performance differences, with improvements in mean average precision as high as 32% observed on one TREC collection not used in training. In no test is BM25 shown to significantly outperform the best learned function

    Textual data mining applications for industrial knowledge management solutions

    Get PDF
    In recent years knowledge has become an important resource to enhance the business and many activities are required to manage these knowledge resources well and help companies to remain competitive within industrial environments. The data available in most industrial setups is complex in nature and multiple different data formats may be generated to track the progress of different projects either related to developing new products or providing better services to the customers. Knowledge Discovery from different databases requires considerable efforts and energies and data mining techniques serve the purpose through handling structured data formats. If however the data is semi-structured or unstructured the combined efforts of data and text mining technologies may be needed to bring fruitful results. This thesis focuses on issues related to discovery of knowledge from semi-structured or unstructured data formats through the applications of textual data mining techniques to automate the classification of textual information into two different categories or classes which can then be used to help manage the knowledge available in multiple data formats. Applications of different data mining techniques to discover valuable information and knowledge from manufacturing or construction industries have been explored as part of a literature review. The application of text mining techniques to handle semi-structured or unstructured data has been discussed in detail. A novel integration of different data and text mining tools has been proposed in the form of a framework in which knowledge discovery and its refinement processes are performed through the application of Clustering and Apriori Association Rule of Mining algorithms. Finally the hypothesis of acquiring better classification accuracies has been detailed through the application of the methodology on case study data available in the form of Post Project Reviews (PPRs) reports. The process of discovering useful knowledge, its interpretation and utilisation has been automated to classify the textual data into two classes.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Proceedings of the 18th Irish Conference on Artificial Intelligence and Cognitive Science

    Get PDF
    These proceedings contain the papers that were accepted for publication at AICS-2007, the 18th Annual Conference on Artificial Intelligence and Cognitive Science, which was held in the Technological University Dublin; Dublin, Ireland; on the 29th to the 31st August 2007. AICS is the annual conference of the Artificial Intelligence Association of Ireland (AIAI)
    corecore