1,102 research outputs found

    CEAI: CCM based Email Authorship Identification Model

    Full text link
    In this paper we present a model for email authorship identification (EAI) by employing a Cluster-based Classification (CCM) technique. Traditionally, stylometric features have been successfully employed in various authorship analysis tasks; we extend the traditional feature-set to include some more interesting and effective features for email authorship identification (e.g. the last punctuation mark used in an email, the tendency of an author to use capitalization at the start of an email, or the punctuation after a greeting or farewell). We also included Info Gain feature selection based content features. It is observed that the use of such features in the authorship identification process has a positive impact on the accuracy of the authorship identification task. We performed experiments to justify our arguments and compared the results with other base line models. Experimental results reveal that the proposed CCM-based email authorship identification model, along with the proposed feature set, outperforms the state-of-the-art support vector machine (SVM)-based models, as well as the models proposed by Iqbal et al. [1, 2]. The proposed model attains an accuracy rate of 94% for 10 authors, 89% for 25 authors, and 81% for 50 authors, respectively on Enron dataset, while 89.5% accuracy has been achieved on authors' constructed real email dataset. The results on Enron dataset have been achieved on quite a large number of authors as compared to the models proposed by Iqbal et al. [1, 2]

    ANALYSIS OF CUSTOMER SATISFACTION FOR COMPETITIVE ADVANTAGE USING CLUSTERING AND ASSOCIATION RULES

    Get PDF
    Customer satisfaction is a very important factor in organizational profit and positioning for effective competitive advantage requires making decisions based on quality inferences from data mining. The aim of this paper is to provide competitive advantage inferences based on analyzing customer satisfaction data using the combination of k-means clustering and association rule mining technique. Based on the information gotten from the questionnaires administered to retrieve customer satisfaction information of mobile network service providers in Nigeria, prediction is done and inferences are generated with the help of clusters and association rules. This paper proposes an effective method to extract knowledge from questionnaire data which is very useful for improving the competitive advantage of organizations. In conclusion, the paper has been able to identify the factors that contribute to customer satisfaction in the Nigeria Mobile Network secto

    What Causes My Test Alarm? Automatic Cause Analysis for Test Alarms in System and Integration Testing

    Full text link
    Driven by new software development processes and testing in clouds, system and integration testing nowadays tends to produce enormous number of alarms. Such test alarms lay an almost unbearable burden on software testing engineers who have to manually analyze the causes of these alarms. The causes are critical because they decide which stakeholders are responsible to fix the bugs detected during the testing. In this paper, we present a novel approach that aims to relieve the burden by automating the procedure. Our approach, called Cause Analysis Model, exploits information retrieval techniques to efficiently infer test alarm causes based on test logs. We have developed a prototype and evaluated our tool on two industrial datasets with more than 14,000 test alarms. Experiments on the two datasets show that our tool achieves an accuracy of 58.3% and 65.8%, respectively, which outperforms the baseline algorithms by up to 13.3%. Our algorithm is also extremely efficient, spending about 0.1s per cause analysis. Due to the attractive experimental results, our industrial partner, a leading information and communication technology company in the world, has deployed the tool and it achieves an average accuracy of 72% after two months of running, nearly three times more accurate than a previous strategy based on regular expressions.Comment: 12 page

    CROSS-LINGUAL TEXT CLASSIFICATION WITH MODEL TRANSLATION AND DOCUMENT TRANSLATION

    Get PDF
    Most enterprise search engines employ data mining classifiers to classify documents. Along with the economic globalization, many companies are starting to have overseas branches or divisions. Those branches are using local languages in documents and emails. When a classifier tries to categorize those documents in another language, the trained model in mono-lingual will not work. The most direct solution would be to translate those documents in other languages into one language by the machine translator. But this solution suffers from inaccuracy of the machine translation, and the over-head work is economically inefficient. Another approach is to translate the feature extracted from one language to another language and use them to classify another language. This approach is efficient but faces a translation inaccuracy and language culture gap. In this project, the author proposes a new method which adapts both the model translation and document translation. This method can take advantage of the very best functionality between both the document translation and model translation methods
    • …
    corecore