5 research outputs found

    You've Got Email: A Workflow Management Extraction System

    Get PDF
    Email is one of the most powerful tools for communication. Many businesses use email as the main channel for communication, so it is possible that substantial data are included in email content. In order to help businesses grow faster, a workflow management system may be required. The data gathered from email content might be a robust source for a workflow management system. This research proposes an email extraction system to extract data from any incoming emails into suitable database fields. The database, which is created by the program, has been planned for the implementation of a workflow management system. The research is presented in three phases: (1) define suitable criteria to extract data; (2) implement a program to extract data, and store them in a database; and (3) implement a program for validating data in a database. Four criteria are applied for an email extraction system. The first criterion is to select contact information at the end of the email content; the second criterion is to select specified keywords, such as tel, email, and mobile; the third criterion is to select unique names, which start with a capital letter, such as the names of people, places, and corporates; the fourth criterion is to select special texts, such as Co. Ltd, .com, and www. The empirical results suggest that when all four criteria are considered, the accuracy of a program and percentage of blank fields are at an acceptable level compared with the results from other criteria. When four criteria are applied to extract 7,340 emails in English, the accuracy of this experiment is approximately 68.66%, while the percentage of blank fields in a database is approximately 68.05. The database created by the experiment can be applied in a workflow management system

    A semantic partition based text mining model for document classification.

    Get PDF

    Comparison of edit history clustering techniques for spatial hypertext

    Get PDF
    History mechanisms available in hypertext systems allow access to past user interactions with the system. This helps users evaluate past work and learn from past activity. It also allows systems identify usage patterns and potentially predict behaviors with the system. Thus, recording history is useful to both the system and the user. Various tools and techniques have been developed to group and annotate history in Visual Knowledge Builder (VKB). But the problem with these tools is that the operations are performed manually. For a large VKB history growing over a long period of time, performing grouping operations using such tools is difficult and time consuming. This thesis examines various methods to analyze VKB history in order to automatically group/cluster all the user events in this history. In this thesis, three different approaches are compared. The first approach is a pattern matching approach identifying repeated patterns of edit events in the history. The second approach is a rule-based approach that uses simple rules, such as group all consecutive events on a single object. The third approach uses hierarchical agglomerative clustering (HAC) where edits are grouped based on a function of edit time and edit location. The contributions of this thesis work are: (a) developing tools to automatically cluster large VKB history using these approaches, (b) analyzing performance of each approach in order to determine their relative strengths and weaknesses, and (c) answering the question, how well do the automatic clustering approaches perform by comparing the results obtained from this automatic tool with that obtained from the manual grouping performed by actual users on a same set of VKB history. Results obtained from this thesis work show that the rule-based approach performs the best in that it best matches human-defined groups and generates the fewest number of groups. The hierarchic agglomerative clustering approach is in between the other two approaches with regards to identifying human-defined groups. The pattern-matching approach generates many potential groups but only a few matches with those generated by actual VKB users

    Rule-based Word Clustering for Text Classification

    No full text
    This paper introduces a rule-based, context-dependent word clustering method, with the rules derived from various domain databases and the word text orthographic properties. Besides significant dimensionality reduction, our experiments show that such rule-based word clustering improves by 8% the overall accuracy of extracting bibliographic fields from references, and by 18.32% on average the class-specific performance on the line classification of document headers
    corecore