3 research outputs found

    Effective pattern discovery for text mining

    Get PDF
    Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase) based approaches should perform better than the term-based ones, but many experiments did not support this hypothesis. This paper presents an innovative technique, effective pattern discovery which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. Substantial experiments on RCV1 data collection and TREC topics demonstrate that the proposed solution achieves encouraging performance

    Automatic pattern-taxonomy extraction for web mining

    Full text link
    In this paper, we propose a model for discovering frequent sequential patterns, phrases, which can be used as profile descriptors of documents. It is indubitable that we can obtain numerous phrases using data mining algorithms. However, it is difficult to use these phrases effectively for answering what users want. Therefore, we present a pattern taxonomy extraction model which performs the task of extracting descriptive frequent sequential patterns by pruning the meaningless ones. The model then is extended and tested by applying it to the information filtering system. The results of the experiment show that pattern-based methods outperform the keyword-based methods. The results also indicate that removal of meaningless patterns not only reduces the cost of computation but also improves the effectiveness of the system. <br /

    A pattern language for evolution reuse in component-based software architectures

    Get PDF
    Context: Modern software systems are prone to a continuous evolution under frequently varying requirements and changes in operational environments. Architecture-Centric Software Evolution (ACSE) enables changes in a system’s structure and behaviour while maintaining a global view of the software to address evolution-centric trade-offs. Lehman’s law of continuing change demands for long-living and continuously evolving architectures to prolong the productive life and economic value of software. Also some industrial research shows that evolution reuse can save approximately 40% effort of change implementation in ACSE process. However, a systematic review of existing research suggests a lack of solution(s) to support a continuous integration of reuse knowledge in ACSE process to promote evolution-off-the-shelf in software architectures. Objectives: We aim to unify the concepts of software repository mining and software evolution to discover evolution-reuse knowledge that can be shared and reused to guide ACSE. Method: We exploit repository mining techniques (also architecture change mining) that investigates architecture change logs to discover change operationalisation and patterns. We apply software evolution concepts (also architecture change execution) to support pattern-driven reuse in ACSE. Architecture change patterns support composition and application of a pattern language that exploits patterns and their relations to express evolution-reuse knowledge. Pattern language composition is enabled with a continuous discovery of patterns from architecture change logs and formalising relations among discovered patterns. Pattern language application is supported with an incremental selection and application of patterns to achieve reuse in ACSE. The novelty of the research lies with a framework PatEvol that supports a round-trip approach for a continuous acquisition (mining) and application (execution) of reuse knowledge to enable ACSE. Prototype support enables customisation and (semi-) automation for the evolution process. Results: We evaluated the results based on the ISO/IEC 9126 - 1 quality model and a case study based validation of the architecture change mining and change execution processes. We observe consistency and reusability of change support with pattern-driven architecture evolution. Change patterns support efficiency for architecture evolution process but lack a fine-granular change implementation. A critical challenge lies with the selection of appropriate patterns to form a pattern language during evolution. Conclusions: The pattern language itself continuously evolves with an incremental discovery of new patterns from change logs over time. A systematic identification and resolution of change anti-patterns define the scope for future research
    corecore