2,075 research outputs found

    A Survey on Particle Swarm Optimization for Association Rule Mining

    Get PDF
    Association rule mining (ARM) is one of the core techniques of data mining to discover potentially valuable association relationships from mixed datasets. In the current research, various heuristic algorithms have been introduced into ARM to address the high computation time of traditional ARM. Although a more detailed review of the heuristic algorithms based on ARM is available, this paper differs from the existing reviews in that we expected it to provide a more comprehensive and multi-faceted survey of emerging research, which could provide a reference for researchers in the field to help them understand the state-of-the-art PSO-based ARM algorithms. In this paper, we review the existing research results. Heuristic algorithms for ARM were divided into three main groups, including biologically inspired, physically inspired, and other algorithms. Additionally, different types of ARM and their evaluation metrics are described in this paper, and the current status of the improvement in PSO algorithms is discussed in stages, including swarm initialization, algorithm parameter optimization, optimal particle update, and velocity and position updates. Furthermore, we discuss the applications of PSO-based ARM algorithms and propose further research directions by exploring the existing problems.publishedVersio

    Improved Association Rule Mining-Based Data Sanitization for Privacy Preservation Model in Cloud, Journal of Telecommunications and Information Technology, 2023, nr 1

    Get PDF
    Data security in cloud services is achieved by imposing a broad range of privacy settings and restrictions. However, the different security techniques used fail to eliminate the hazard of serious data leakage, information loss and other vulnerabilities. Therefore, better security policy requirements are necessary to ensure acceptable data protection levels in the cloud. The two procedures presented in this paper are intended to build a new cloud data security method. Here, sensitive data stored in big datasets is protected from abuse via the data sanitization procedure relying on an improved apriori approach to clean the data. The main objective in this case is to generate a key using an optimization technique known as Corona-integrated Archimedes Optimization with Tent Map Estimation (CIAO-TME). Such a technique deals with both restoration and sanitization of data. The problem of optimizing the data preservation ratio (IPR), the hiding ratio (HR), and the degree of modification (DOM) is formulated and researched as well

    Temporal Information in Data Science: An Integrated Framework and its Applications

    Get PDF
    Data science is a well-known buzzword, that is in fact composed of two distinct keywords, i.e., data and science. Data itself is of great importance: each analysis task begins from a set of examples. Based on such a consideration, the present work starts with the analysis of a real case scenario, by considering the development of a data warehouse-based decision support system for an Italian contact center company. Then, relying on the information collected in the developed system, a set of machine learning-based analysis tasks have been developed to answer specific business questions, such as employee work anomaly detection and automatic call classification. Although such initial applications rely on already available algorithms, as we shall see, some clever analysis workflows had also to be developed. Afterwards, continuously driven by real data and real world applications, we turned ourselves to the question of how to handle temporal information within classical decision tree models. Our research brought us the development of J48SS, a decision tree induction algorithm based on Quinlan's C4.5 learner, which is capable of dealing with temporal (e.g., sequential and time series) as well as atemporal (such as numerical and categorical) data during the same execution cycle. The decision tree has been applied into some real world analysis tasks, proving its worthiness. A key characteristic of J48SS is its interpretability, an aspect that we specifically addressed through the study of an evolutionary-based decision tree pruning technique. Next, since a lot of work concerning the management of temporal information has already been done in automated reasoning and formal verification fields, a natural direction in which to proceed was that of investigating how such solutions may be combined with machine learning, following two main tracks. First, we show, through the development of an enriched decision tree capable of encoding temporal information by means of interval temporal logic formulas, how a machine learning algorithm can successfully exploit temporal logic to perform data analysis. Then, we focus on the opposite direction, i.e., that of employing machine learning techniques to generate temporal logic formulas, considering a natural language processing scenario. Finally, as a conclusive development, the architecture of a system is proposed, in which formal methods and machine learning techniques are seamlessly combined to perform anomaly detection and predictive maintenance tasks. Such an integration represents an original, thrilling research direction that may open up new ways of dealing with complex, real-world problems.Data science is a well-known buzzword, that is in fact composed of two distinct keywords, i.e., data and science. Data itself is of great importance: each analysis task begins from a set of examples. Based on such a consideration, the present work starts with the analysis of a real case scenario, by considering the development of a data warehouse-based decision support system for an Italian contact center company. Then, relying on the information collected in the developed system, a set of machine learning-based analysis tasks have been developed to answer specific business questions, such as employee work anomaly detection and automatic call classification. Although such initial applications rely on already available algorithms, as we shall see, some clever analysis workflows had also to be developed. Afterwards, continuously driven by real data and real world applications, we turned ourselves to the question of how to handle temporal information within classical decision tree models. Our research brought us the development of J48SS, a decision tree induction algorithm based on Quinlan's C4.5 learner, which is capable of dealing with temporal (e.g., sequential and time series) as well as atemporal (such as numerical and categorical) data during the same execution cycle. The decision tree has been applied into some real world analysis tasks, proving its worthiness. A key characteristic of J48SS is its interpretability, an aspect that we specifically addressed through the study of an evolutionary-based decision tree pruning technique. Next, since a lot of work concerning the management of temporal information has already been done in automated reasoning and formal verification fields, a natural direction in which to proceed was that of investigating how such solutions may be combined with machine learning, following two main tracks. First, we show, through the development of an enriched decision tree capable of encoding temporal information by means of interval temporal logic formulas, how a machine learning algorithm can successfully exploit temporal logic to perform data analysis. Then, we focus on the opposite direction, i.e., that of employing machine learning techniques to generate temporal logic formulas, considering a natural language processing scenario. Finally, as a conclusive development, the architecture of a system is proposed, in which formal methods and machine learning techniques are seamlessly combined to perform anomaly detection and predictive maintenance tasks. Such an integration represents an original, thrilling research direction that may open up new ways of dealing with complex, real-world problems

    A Multi-Level Framework for the Detection, Prioritization and Testing of Software Design Defects

    Full text link
    Large-scale software systems exhibit high complexity and become difficult to maintain. In fact, it has been reported that software cost dedicated to maintenance and evolution activities is more than 80% of the total software costs. In particular, object-oriented software systems need to follow some traditional design principles such as data abstraction, encapsulation, and modularity. However, some of these non-functional requirements can be violated by developers for many reasons such as inexperience with object-oriented design principles, deadline stress. This high cost of maintenance activities could potentially be greatly reduced by providing automatic or semi-automatic solutions to increase system‟s comprehensibility, adaptability and extensibility to avoid bad-practices. The detection of refactoring opportunities focuses on the detection of bad smells, also called antipatterns, which have been recognized as the design situations that may cause software failures indirectly. The correction of one bad smell may influence other bad smells. Thus, the order of fixing bad smells is important to reduce the effort and maximize the refactoring benefits. However, very few studies addressed the problem of finding the optimal sequence in which the refactoring opportunities, such as bad smells, should be ordered. Few other studies tried to prioritize refactoring opportunities based on the types of bad smells to determine their severity. However, the correction of severe bad smells may require a high effort which should be optimized and the relationships between the different bad smells are not considered during the prioritization process. The main goal of this research is to help software engineers to refactor large-scale systems with a minimum effort and few interactions including the detection, management and testing of refactoring opportunities. We report the results of an empirical study with an implementation of our bi-level approach. The obtained results provide evidence to support the claim that our proposal is more efficient, on average, than existing techniques based on a benchmark of 9 open source systems and 1 industrial project. We have also evaluated the relevance and usefulness of the proposed bi-level framework for software engineers to improve the quality of their systems and support the detection of transformation errors by generating efficient test cases.Ph.D.Information Systems Engineering, College of Engineering and Computer ScienceUniversity of Michigan-Dearbornhttp://deepblue.lib.umich.edu/bitstream/2027.42/136075/1/Dilan_Sahin_Final Dissertation.pdfDescription of Dilan_Sahin_Final Dissertation.pdf : Dissertatio

    Applied Metaheuristic Computing

    Get PDF
    For decades, Applied Metaheuristic Computing (AMC) has been a prevailing optimization technique for tackling perplexing engineering and business problems, such as scheduling, routing, ordering, bin packing, assignment, facility layout planning, among others. This is partly because the classic exact methods are constrained with prior assumptions, and partly due to the heuristics being problem-dependent and lacking generalization. AMC, on the contrary, guides the course of low-level heuristics to search beyond the local optimality, which impairs the capability of traditional computation methods. This topic series has collected quality papers proposing cutting-edge methodology and innovative applications which drive the advances of AMC

    A theoretical and computational basis for CATNETS

    Get PDF
    The main content of this report is the identification and definition of market mechanisms for Application Layer Networks (ALNs). On basis of the structured Market Engineering process, the work comprises the identification of requirements which adequate market mechanisms for ALNs have to fulfill. Subsequently, two mechanisms for each, the centralized and the decentralized case are described in this document. These build the theoretical foundation for the work within the following two years of the CATNETS project. --Grid Computing

    Theoretical and Computational Basis for Economical Ressource Allocation in Application Layer Networks - Annual Report Year 1

    Get PDF
    This paper identifies and defines suitable market mechanisms for Application Layer Networks (ALNs). On basis of the structured Market Engineering process, the work comprises the identification of requirements which adequate market mechanisms for ALNs have to fulfill. Subsequently, two mechanisms for each, the centralized and the decentralized case are described in this document. --Grid Computing

    Bioinformatics Applications Based On Machine Learning

    Get PDF
    The great advances in information technology (IT) have implications for many sectors, such as bioinformatics, and has considerably increased their possibilities. This book presents a collection of 11 original research papers, all of them related to the application of IT-related techniques within the bioinformatics sector: from new applications created from the adaptation and application of existing techniques to the creation of new methodologies to solve existing problems
    corecore