Search CORE

80 research outputs found

Mining Coding Patterns to Detect Crosscutting Concerns in Java Programs

Author: Date Hironori
Inoue Katsuro
Ishio Takashi
Miyake Tatsuya
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Reverse Engineering, 2008. WCRE '08. 15th Working Conference onDate of Conference:15-18 Oct. 2008Conference Location :Antwer

Osaka University Knowledge Archive

Recommended from our members

Design and Implementation of a Web Usage Mining Model Based On Upgrowth and Preflxspan

Author: Wang Hengshan
Yang Cheng
Zeng Hua
Publication venue: CSUSB ScholarWorks
Publication date: 06/01/2015
Field of study

Web Usage Mining (WUM) integrates the techniques of two popular research fields - Data Mining and the Internet. By analyzing the potential rules hidden in web logs, WUM helps personalize the delivery of web content and improve web design, customer satisfaction and user navigation through pre-fetching and caching. This paper introduces two prevalent data mining algorithms - FPgrowth and PrefixSpan into WUM and they are applied in a real business case. Maximum Forward Path (MFP) is also used in the web usage mining model during sequential pattern mining along with PrefixSpan so as to reduce the interference of false visit caused by browser cache and raise the accuracy of mining frequent traversal paths. Detailed analysis and application on the corresponding results are discussed

CSUSB ScholarWorks

Using Answer Set Programming for pattern mining

Author: Guyet Thomas
Moinard Yves
Quiniou René
Publication venue
Publication date: 11/06/2014
Field of study

Serial pattern mining consists in extracting the frequent sequential patterns from a unique sequence of itemsets. This paper explores the ability of a declarative language, such as Answer Set Programming (ASP), to solve this issue efficiently. We propose several ASP implementations of the frequent sequential pattern mining task: a non-incremental and an incremental resolution. The results show that the incremental resolution is more efficient than the non-incremental one, but both ASP programs are less efficient than dedicated algorithms. Nonetheless, this approach can be seen as a first step toward a generic framework for sequential pattern mining with constraints.Comment: Intelligence Artificielle Fondamentale (2014

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Machine Learning for Supermarket Data Analysis

Author: Salik Ram Khanal
Publication venue
Publication date: 26/07/2016
Field of study

Repositório Aberto da Universidade do Porto

HCPC: Human centric program comprehension by grouping static execution scenarios

Author: Bhattacharjee Avijit
Publication venue: 'University of Saskatchewan Library'
Publication date: 16/08/2021
Field of study

New members of a software team can struggle to locate user requirements if proper software engineering principles are not practiced. Reading through code, finding relevant methods, classes and files take a significant portion of software development time. Many times developers have to fix issues in code written by others. Having a good tool support for this code browsing activity can reduce human effort and increase overall developers' productivity. To help program comprehension activities, building an abstract code summary of a software system from the call graph is an active research area. A call graph is a visual representation of caller-callee relationships between different methods of a software project. Call graphs can be difficult to comprehend for a larger code-base. The motivation is to extract the essence from the call graph by finding execution scenarios from a call graph and then cluster them together by concentrating the information in the code-base. Later, different techniques are applied to label nodes in the abstract code summary tree. In this thesis, we focus on static call graphs for creating an abstract code summary tree as it clusters all possible program scenarios and groups similar scenarios together. Previous work on static call graph clusters execution paths and uses only one information retrieval technique without any feedback from developers. First, to advance existing work, we introduced new information retrieval techniques alongside human-involved evaluation. We found that developers prefer node labels generated by terms in method names with TFIDF (term frequency-inverse document frequency). Second, from our observation, we introduced two new types of information (text description using comments and execution patterns) for abstraction nodes to provide better overview. Finally, we introduced an interactive software tool which can be used to browse the code-base in a guided way by targeting specific units of the source code. In the user study, we found developers can use our tool to overview a project alongside finding help for doing particular jobs such as locating relevant files and understanding relevant domain knowledge

University of Saskatchewan Research Archive

ブンサン　ショリ　ヲ　モチイタ　コーディング　パターン　ケンシュツ　ツール　ノ　ジッソウ

Author: 井上克郎
伊達浩典
悦田翔悟
石尾隆
Publication venue: 情報処理学会
Publication date: 01/01/2009
Field of study

Osaka University Knowledge Archive

Data Mining for Automatic Generation of Software Tests

Author: Alberto Plácido Oliveira
Publication venue
Publication date: 10/11/2020
Field of study

Repositório Aberto da Universidade do Porto

Sequential pattern mining with uncertain data

Author: Ge Jiaqi
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2016
Field of study

In recent years, a number of emerging applications, such as sensor monitoring systems, RFID networks and location based services, have led to the proliferation of uncertain data. However, traditional data mining algorithms are usually inapplicable in uncertain data because of its probabilistic nature. Uncertainty has to be carefully handled; otherwise, it might significantly downgrade the quality of underlying data mining applications. Therefore, we extend traditional data mining algorithms into their uncertain versions so that they still can produce accurate results. In particular, we use a motivating example of sequential pattern mining to illustrate how to incorporate uncertain information in the process of data mining. We use possible world semantics to interpret two typical types of uncertainty: the tuple-level existential uncertainty and the attribute-level temporal uncertainty. In an uncertain database, it is probabilistic that a pattern is frequent or not; thus, we define the concept of probabilistic frequent sequential patterns. And various algorithms are designed to mine probabilistic frequent patterns efficiently in uncertain databases. We also implement our algorithms on distributed computing platforms, such as MapReduce and Spark, so that they can be applied in large scale databases. Our work also includes uncertainty computation in supervised machine learning algorithms. We develop an artificial neural network to classify numeric uncertain data; and a Naive Bayesian classifier is designed for classifying categorical uncertain data streams. We also propose a discretization algorithm to pre-process numerical uncertain data, since many classifiers work with categoric data only. And experimental results in both synthetic and real-world uncertain datasets demonstrate that our methods are effective and efficient

Purdue E-Pubs

Fast implementation of pattern mining algorithms with time stamp uncertainties and temporal constraints

Author: Aivaliotis G
Palczewski J
Titarenko SS
Titarenko VN
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/05/2019
Field of study

Pattern mining is a powerful tool for analysing big datasets. Temporal datasets include time as an additional parameter. This leads to complexity in algorithmic formulation, and it can be challenging to process such data quickly and efficiently. In addition, errors or uncertainty can exist in the timestamps of data, for example in manually recorded health data. Sometimes we wish to find patterns only within a certain temporal range. In some cases real-time processing and decision-making may be desirable. All these issues increase algorithmic complexity, processing times and storage requirements. In addition, it may not be possible to store or process confidential data on public clusters or the cloud that can be accessed by many people. Hence it is desirable to optimise algorithms for standalone systems. In this paper we present an integrated approach which can be used to write efficient codes for pattern mining problems. The approach includes: (1) cleaning datasets with removal of infrequent events, (2) presenting a new scheme for time-series data storage, (3) exploiting the presence of prior information about a dataset when available, (4) utilising vectorisation and multicore parallelisation. We present two new algorithms, FARPAM (FAst Robust PAttern Mining) and FARPAMp (FARPAM with prior information about prior uncertainty, allowing faster searching). The algorithms are applicable to a wide range of temporal datasets. They implement a new formulation of the pattern searching function which reproduces and extends existing algorithms (such as SPAM and RobustSPAM), and allows for significantly faster calculation. The algorithms also include an option of temporal restrictions in patterns, which is available neither in SPAM nor in RobustSPAM. The searching algorithm is designed to be flexible for further possible extensions. The algorithms are coded in C++, and are highly optimised and parallelised for a modern standalone multicore workstation, thus avoiding security issues connected with transfers of confidential data onto clusters. FARPAM has been successfully tested on a publicly available weather dataset and on a confidential adult social care dataset, reproducing results obtained by previous algorithms in both cases. It has been profiled against the widely used SPAM algorithm (for sequential pattern mining) and RobustSPAM (developed for datasets with errors in time points). The algorithm outperforms SPAM by up to 20 times and RobustSPAM by up to 6000 times. In both cases the new algorithm has better scalability

White Rose Research Online

Huddersfield Research Portal