Search CORE

9 research outputs found

ソフトウェア工学データマイニングのためのデータ生成・評価法

Author: GAN Maohua
Publication venue
Publication date: 24/03/2023
Field of study

Okayama University Scientific Achievement Repository

Improvement and Evaluation of Data Consistency Metric CIL for Software Engineering Data Sets

Author: Gan Maohua
Monden Akito
Yucel Zeynep
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Software data sets derived from actual software products and their development processes are widely used for project planning, management, quality assurance and process improvement, etc. Although it is demonstrated that certain data sets are not fit for these purposes, the data quality of data sets is often not assessed before using them. The principal reason for this is that there are not many metrics quantifying fitness of software development data. In that respect, this study makes an effort to fill in the void in literature by devising a new and efficient assessment method of data quality. To that end, we start as a reference from Case Inconsistency Level (CIL), which counts the number of inconsistent project pairs in a data set to evaluate its consistency. Based on a follow-up evaluation with a large sample set, we depict that CIL is not effective in evaluating the quality of certain data sets. By studying the problems associated with CIL and eliminating them, we propose an improved metric called Similar Case Inconsistency Level (SCIL). Our empirical evaluation with 54 data samples derived from six large project data sets shows that SCIL can distinguish between consistent and inconsistent data sets, and that prediction models for software development effort and productivity built from consistent data sets achieve indeed a relatively higher accuracy

Okayama University Scientific Achievement Repository

Empirical Evaluation of Mimic Software Project Data Sets for Software Effort Estimation

Author: Gan Maohua
Monden Akito
Sasaki Kentaro
Yücel Zeynep
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date: 01/10/2020
Field of study

To conduct empirical research on industry software development, it is necessary to obtain data of real software projects from industry. However, only few such industry data sets are publicly available; and unfortunately, most of them are very old. In addition, most of today's software companies cannot make their data open, because software development involves many stakeholders, and thus, its data confidentiality must be strongly preserved. To that end, this study proposes a method for artificially generating a “mimic” software project data set, whose characteristics (such as average, standard deviation and correlation coefficients) are very similar to a given confidential data set. Instead of using the original (confidential) data set, researchers are expected to use the mimic data set to produce similar results as the original data set. The proposed method uses the Box-Muller transform for generating normally distributed random numbers; and exponential transformation and number reordering for data mimicry. To evaluate the efficacy of the proposed method, effort estimation is considered as potential application domain for employing mimic data. Estimation models are built from 8 reference data sets and their concerning mimic data. Our experiments confirmed that models built from mimic data sets show similar effort estimation performance as the models built from original data sets, which indicate the capability of the proposed method in generating representative samples

Okayama University Scientific Achievement Repository

Discovering Utility-driven Interval Rules

Author: Gan Wensheng
Hao Huaijin
Lyu Maohua
Yu Philip S.
Zhang Chunkai
Publication venue
Publication date: 27/09/2023
Field of study

For artificial intelligence, high-utility sequential rule mining (HUSRM) is a knowledge discovery method that can reveal the associations between events in the sequences. Recently, abundant methods have been proposed to discover high-utility sequence rules. However, the existing methods are all related to point-based sequences. Interval events that persist for some time are common. Traditional interval-event sequence knowledge discovery tasks mainly focus on pattern discovery, but patterns cannot reveal the correlation between interval events well. Moreover, the existing HUSRM algorithms cannot be directly applied to interval-event sequences since the relation in interval-event sequences is much more intricate than those in point-based sequences. In this work, we propose a utility-driven interval rule mining (UIRMiner) algorithm that can extract all utility-driven interval rules (UIRs) from the interval-event sequence database to solve the problem. In UIRMiner, we first introduce a numeric encoding relation representation, which can save much time on relation computation and storage on relation representation. Furthermore, to shrink the search space, we also propose a complement pruning strategy, which incorporates the utility upper bound with the relation. Finally, plentiful experiments implemented on both real-world and synthetic datasets verify that UIRMiner is an effective and efficient algorithm.Comment: Preprint. 11 figures, 5 table

arXiv.org e-Print Archive

The effect of aging on the frequency, phenotype and cytokine production of human blood CD4 + CXCR5 + T follicular helper cells: comparison of aged and young subjects

Author: Erxia Shen
Fang He
Fujun Li
Huiquan Gan
Maohua Zhou
Ruqiong Zou
Ting Lin
Xiaoming Cai
Yanfei Luo
Zhimei Liang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Springer - Publisher Connector

Neg/pos-Normalized Accuracy Measures for Software Defect Prediction

Author: Akito Monden
Maohua Gan
Zeynep Yucel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

In evaluating the performance of software defect prediction models, accuracy measures such as precision and recall are commonly used. However, most of these measures are affected by neg/pos ratio of the data set being predicted, where neg is the number of negative cases (defect-free modules) and pos is the number of positive cases (defective modules). Thus, it is not fair to compare such values across different data sets with different neg/pos ratios and it may even lead to misleading or contradicting conclusions. The objective of this study is to address the class imbalance issue in assessing performance of defect prediction models. The proposed method relies on computation of expected values of accuracy measures based solely on the value of the neg and pos values of the data set. Based on the expected values, we derive the neg/pos-normalized accuracy measures, which are defined as their divergence from the expected value divided by the standard deviation of all possible prediction outcomes. The proposed measures enable us to provide a ranking of predictions across different data sets, which can distinguish between successful predictions and unsuccessful predictions. Our results derived from a case study of defect prediction based on 19 defect data sets indicate that ranking of predictions is significantly different than the ranking of conventional accuracy measures such as precision and recall as well as composite measures F1-value, AUC of ROC, MCC, G-mean and Balance. In addition, we conclude that MCC attains a better defect prediction accuracy than F1-value, AUC of ROC, G-mean and Balance

Directory of Open Access Journals

Okayama University Scientific Achievement Repository

Improvement and Evaluation of Data Consistency Metric CIL for Software Engineering Data Sets

Author: Gan Maohua
Monden Akito
Yucel Zeynep
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Institutional Repositories DataBase (IRDB)

A sensitive impedance biosensor based on immunomagnetic separation and urease catalysis for rapid detection of Listeria monocytogenes using an immobilization-free interdigitated array microelectrode

Author: Aureli
Batz
Blažková
Bouguelia
Chengqi Gan
Cole
Dan Wang
El Marnissi
Ercole
Ferreira
Fidaleo
Galikowska
Gambarin
Gormley
Huang
Jianhan Lin
Karoonuthaisiri
Kawasaki
Kozak
Larsen
Lin
Lin
Liu
Maohua Wang
Massad-Ivanir
Meyer
Qi Chen
Rivoal
Rodriguez
Safavieh
Sant'Ana
Silk
Su
Swaminathan
Vial
Weihua Lai
Yang
Yang
Yeni
Yonghua Xiong
Yuhe Wang
Yuntao Li
Zhao
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Recombinant SARS-CoV-2 spike S1-Fc fusion protein induced high levels of neutralizing responses in nonhuman primates

Author: Ahmed
Ailiang He
Amanat
Chan
Chenxi Xia
Cohen
Feng Fang
Gan Zhao
George F. Gao
Grant
Guang Gao
Huang
Hunter Sun
Jefferis
Jianxin Chen
Jinggong Chen
Junchi Su
Le Sun
Lecrenier
Li
Li
Li
Maohua Li
Miller
Min Li
Ou
Qi Wang
Rongqing Zhao
Sean Sun
Sitao Gong
Song
Stertz
Vankadari
Wen Gao
Wenlin Ren
Wong
Wu
Xia Jin
Xin Cheng
Yalin Hu
Yufei Sun
Yuxin Chen
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref