8 research outputs found

    Examining Swarm Intelligence-based Feature Selection for Multi-Label Classification

    Get PDF
    Multi-label classification addresses the issues that more than one class label assigns to each instance. Many real-world multi-label classification tasks are high-dimensional due to digital technologies, leading to reduced performance of traditional multi-label classifiers. Feature selection is a common and successful approach to tackling this problem by retaining relevant features and eliminating redundant ones to reduce dimensionality. There is several feature selection that is successfully applied in multi-label learning. Most of those features are wrapper methods that employ a multi-label classifier in their processes. They run a classifier in each step, which requires a high computational cost, and thus they suffer from scalability issues. To deal with this issue, filter methods are introduced to evaluate the feature subsets using information-theoretic mechanisms instead of running classifiers. This paper aims to provide a comprehensive review of different methods of feature selection presented for the tasks of multi-label classification. To this end, in this review, we have investigated most of the well-known and state-of-the-art methods. We then provided the main characteristics of the existing multi-label feature selection techniques and compared them analytically

    Multi-Target Prediction: A Unifying View on Problems and Methods

    Full text link
    Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. In this paper, we present a unifying view on MTP problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research

    多标签学习中基于互信息的快速特征选择方法

    Get PDF
    本文针对传统的基于启发式搜索的多标记特征选择算法时间复杂度高的问题,提出一种简单、快速、有效的多标记特征选择方法(Easy and Fast Multi-Label Feature Selection, EF-MLFS)。该方法首先使用互信息衡量每个维度的特征与每一维标记之间的相关性,然后将所得相关性相加并排序,最后按照总的相关性大小进行特征选择。本文与六种现有的比较有代表性的多标记特征选择方法,例如最大依赖性最小冗余性(Max-Dependency and Min-Redundancy, MDMR)算法,基于朴素贝叶斯的多标记特征选择方法(Multi-Label Naive Bayes, MLNB)等。实验结果表明,使用本文方法进行特征选择并分类后的实验结果在平均准确率(Average Precision)、覆盖率(Coverage)、海明损失(Hamming Loss)等常见的多标记分类评价指标上均可达最优,同时该方法无需进行全局搜索,因此时间复杂度相较于MDMR、PMU等方法也有明显降低,可以极大的降低时间复杂度

    Understanding Variability-Aware Analysis in Low-Maturity Variant-Rich Systems

    Get PDF
    Context: Software systems often exist in many variants to support varying stakeholder requirements, such as specific market segments or hardware constraints. Systems with many variants (a.k.a. variant-rich systems) are highly complex due to the variability introduced to support customization. As such, assuring the quality of these systems is also challenging since traditional single-system analysis techniques do not scale when applied. To tackle this complexity, several variability-aware analysis techniques have been conceived in the last two decades to assure the quality of a branch of variant-rich systems called software product lines. Unfortunately, these techniques find little application in practice since many organizations do use product-line engineering techniques, but instead rely on low-maturity \clo~strategies to manage their software variants. For instance, to perform an analysis that checks that all possible variants that can be configured by customers (or vendors) in a car personalization system conform to specified performance requirements, an organization needs to explicitly model system variability. However, in low-maturity variant-rich systems, this and similar kinds of analyses are challenging to perform due to (i) immature architectures that do not systematically account for variability, (ii) redundancy that is not exploited to reduce analysis effort, and (iii) missing essential meta-information, such as relationships between features and their implementation in source code.Objective: The overarching goal of the PhD is to facilitate quality assurance in low-maturity variant-rich systems. Consequently, in the first part of the PhD (comprising this thesis) we focus on gaining a better understanding of quality assurance needs in such systems and of their properties.Method: Our objectives are met by means of (i) knowledge-seeking research through case studies of open-source systems as well as surveys and interviews with practitioners; and (ii) solution-seeking research through the implementation and systematic evaluation of a recommender system that supports recording the information necessary for quality assurance in low-maturity variant-rich systems. With the former, we investigate, among other things, industrial needs and practices for analyzing variant-rich systems; and with the latter, we seek to understand how to obtain information necessary to leverage variability-aware analyses.Results: Four main results emerge from this thesis: first, we present the state-of-practice in assuring the quality of variant-rich systems, second, we present our empirical understanding of features and their characteristics, including information sources for locating them; third, we present our understanding of how best developers\u27 proactive feature location activities can be supported during development; and lastly, we present our understanding of how features are used in the code of non-modular variant-rich systems, taking the case of feature scattering in the Linux kernel.Future work: In the second part of the PhD, we will focus on processes for adapting variability-aware analyses to low-maturity variant-rich systems.Keywords:\ua0Variant-rich Systems, Quality Assurance, Low Maturity Software Systems, Recommender Syste

    Enhanced context-aware framework for individual and crowd condition prediction

    Get PDF
    Context-aware framework is basic context-aware that utilizes contexts such as user with their individual activities, location and time, which are hidden information derived from smartphone sensors. These data are used to monitor a situation in a crowd scenario. Its application using embedded sensors has the potential to monitor tasks that are practically complicated to access. Inaccuracies observed in the individual activity recognition (IAR) due to faulty accelerometer data and data classification problem have led to its inefficiency when used for prediction. This study developed a solution to this problem by introducing a method of feature extraction and selection, which provides a higher accuracy by selecting only the relevant features and minimizing false negative rate (FNR) of IAR used for crowd condition prediction. The approach used was the enhanced context-aware framework (EHCAF) for the prediction of human movement activities during an emergency. Three new methods to ensure high accuracy and low FNR were introduced. Firstly, an improved statistical-based time-frequency domain (SBTFD) representing and extracting hidden context information from sensor signals with improved accuracy was introduced. Secondly, a feature selection method (FSM) to achieve improved accuracy with statistical-based time-frequency domain (SBTFD) and low false negative rate was used. Finally, a method for individual behaviour estimation (IBE) and crowd condition prediction in which the threshold and crowd density determination (CDD) was developed and used, achieved a low false negative rate. The approach showed that the individual behaviour estimation used the best selected features, flow velocity estimation and direction to determine the disparity value of individual abnormality behaviour in a crowd. These were used for individual and crowd density determination evaluation in terms of inflow, outflow and crowd turbulence during an emergency. Classifiers were used to confirm features ability to differentiate individual activity recognition data class. Experimenting SBTFD with decision tree (J48) classifier produced a maximum of 99:2% accuracy and 3:3% false negative rate. The individual classes were classified based on 7 best features, which produced a reduction in dimension, increased accuracy to 99:1% and had a low false negative rate (FNR) of 2:8%. In conclusion, the enhanced context-aware framework that was developed in this research proved to be a viable solution for individual and crowd condition prediction in our society
    corecore