8 research outputs found

    Textual data mining applications for industrial knowledge management solutions

    Get PDF
    In recent years knowledge has become an important resource to enhance the business and many activities are required to manage these knowledge resources well and help companies to remain competitive within industrial environments. The data available in most industrial setups is complex in nature and multiple different data formats may be generated to track the progress of different projects either related to developing new products or providing better services to the customers. Knowledge Discovery from different databases requires considerable efforts and energies and data mining techniques serve the purpose through handling structured data formats. If however the data is semi-structured or unstructured the combined efforts of data and text mining technologies may be needed to bring fruitful results. This thesis focuses on issues related to discovery of knowledge from semi-structured or unstructured data formats through the applications of textual data mining techniques to automate the classification of textual information into two different categories or classes which can then be used to help manage the knowledge available in multiple data formats. Applications of different data mining techniques to discover valuable information and knowledge from manufacturing or construction industries have been explored as part of a literature review. The application of text mining techniques to handle semi-structured or unstructured data has been discussed in detail. A novel integration of different data and text mining tools has been proposed in the form of a framework in which knowledge discovery and its refinement processes are performed through the application of Clustering and Apriori Association Rule of Mining algorithms. Finally the hypothesis of acquiring better classification accuracies has been detailed through the application of the methodology on case study data available in the form of Post Project Reviews (PPRs) reports. The process of discovering useful knowledge, its interpretation and utilisation has been automated to classify the textual data into two classes.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Using Association Rules to Enrich Arabic Ontology

    Get PDF
    In this article, we propose the use of a minimal generic base of associative rules between term association rules, to automatically enrich an existing domain ontology. Initially, non-redundant association rules between terms are extracted from an Arabic corpus. Then, the matching of the candidate terms is done through the matching between the concepts of the initial ontology and the premises of the association rules, with three distance measures that we define

    Discovering core terms for effective short text clustering

    Get PDF
    This thesis aims to address the current limitations in short texts clustering and provides a systematic framework that includes three novel methods to effectively measure similarity of two short texts, efficiently group short texts, and dynamically cluster short text streams

    Natural language processing meets business:algorithms for mining meaning from corporate texts

    Get PDF

    Natural language processing meets business:algorithms for mining meaning from corporate texts

    Get PDF

    Semantic Frameworks for Document and Ontology Clustering

    Get PDF
    Title from PDF of title page, viewed on January 20, 2011.Dissertation advisor: Yugyung Lee.Vita.Includes bibliographic references (pages 194-202).Dissertation (Ph.D)--School of Computing and Engineering. University of Missouri--Kansas City, 2010.The Internet has made it possible, in principle, for scientists to quickly find research papers of interest. In practice, the overwhelming volume of publications makes this a time consuming task. It is, therefore, important to develop efficient ways to identify related publications. Clustering, a technique used in many fields, is one way to facilitate this. Ontologies can also help in addressing the problem of finding related entities, including research publications. However, the development of new methods of clustering has focused mainly on the algorithm per se, with relatively less emphasis on feature selection and similarity measures. The latter can significantly impact the accuracy of clustering, as well as the runtime of clustering. Also, to fully realize the high resolution searches that ontologies can make possible, an important first step is to find automatic ways to cluster related ontologies. The major contribution of this dissertation is an innovative semantic framework for document clustering, called Citonomy, a dynamic approach that (1) exploits citation semantics of scientific documents, (2) deals with evolving datasets of documents, and (3) addresses the interplay between algorithms, feature selections, and similarity measures in an integrated manner. This improves accuracy and runtime performance over existing clustering algorithms. As the first step in Citonomy, we propose a new approach to extract and build a model for citation semantics. Both subjective and objective evaluations prove the effectiveness of this model in extracting citation semantics. For the clustering stage, the Citonomy framework offers two approaches: (1) CS-VS: Combining Citation Semantics and VSM (Vector Space Model) Measures and (2) CS2CS: From Citation Semantics to Cluster Semantics. CS2CS is a document clustering algorithm with a 3-level feature selection process. It is an improvement over CS-VS in several aspects: i) deleting the requirement of a training step, ii) introducing an advanced feature selection mechanism, and iii) dynamic and adaptive clustering of new datasets. Compared to traditional document clustering, CS-VS and CS2CS significantly improve the accuracy of clustering by 5-15% (on average) in terms of the F-Measure. CS2CS is a linear clustering algorithm that is faster than the common document clustering algorithms K-Means and K-Medoids. In addition, it overcomes a major drawback of K-Means/Medoids algorithms in that the number of clusters can be dynamically determined by splitting and merging clusters. Fuzzy clustering with this approach has also been investigated. The related problem of ontology clustering is also addressed in this dissertation. Another semantics framework, InterOBO, has been designed for ontology clustering. A prototype to demonstrate the potential use of this framework, has been developed. The Open Biomedical Ontologies (OBOs) are used as a case study to illustrate the clustering technique used to identify common concepts and links. Detailed experimental results on different data sets are given to show the merits of the proposed clustering algorithms.Abstract -- List of Illustrations -- List of Tables -- Acknowledgments -- Introduction -- Review of Literature -- Overall Framework - Citonomy -- CS-VS - Combining Citation Semantics and VSM Mesasures -- CS2CS - From Citation Semantics to Cluster Semantics -- Interobo: A Framework for Knowledge Sharing in Biomedical Domain -- Experimental Results and Discussion -- Summary and Future Work -- Appendix -- Reference List -- Vita

    Discovering Word Meanings Based on Frequent Termsets

    No full text
    corecore