381 research outputs found

    Models and Strategies for Variants of the Job Shop Scheduling Problem

    Full text link
    Recently, a variety of constraint programming and Boolean satisfiability approaches to scheduling problems have been introduced. They have in common the use of relatively simple propagation mechanisms and an adaptive way to focus on the most constrained part of the problem. In some cases, these methods compare favorably to more classical constraint programming methods relying on propagation algorithms for global unary or cumulative resource constraints and dedicated search heuristics. In particular, we described an approach that combines restarting, with a generic adaptive heuristic and solution guided branching on a simple model based on a decomposition of disjunctive constraints. In this paper, we introduce an adaptation of this technique for an important subclass of job shop scheduling problems (JSPs), where the objective function involves minimization of earliness/tardiness costs. We further show that our technique can be improved by adding domain specific information for one variant of the JSP (involving time lag constraints). In particular we introduce a dedicated greedy heuristic, and an improved model for the case where the maximal time lag is 0 (also referred to as no-wait JSPs).Comment: Principles and Practice of Constraint Programming - CP 2011, Perugia : Italy (2011

    A hybrid decision tree/genetic algorithm method for data mining

    Get PDF

    An under-Sampled Approach for Handling Skewed Data Distribution using Cluster Disjuncts

    Get PDF
    In Data mining and Knowledge Discovery hidden and valuable knowledge from the data sources is discovered. The traditional algorithms used for knowledge discovery are bottle necked due to wide range of data sources availability. Class imbalance is a one of the problem arises due to data source which provide unequal class i.e. examples of one class in a training data set vastly outnumber examples of the other class(es). Researchers have rigorously studied several techniques to alleviate the problem of class imbalance, including resampling algorithms, and feature selection approaches to this problem. In this paper, we present a new hybrid frame work dubbed as Majority Under-sampling based on Cluster Disjunct (MAJOR_CD) for learning from skewed training data. This algorithm provides a simpler and faster alternative by using cluster disjunct concept. We conduct experiments using twelve UCI data sets from various application domains using five algorithms for comparison on six evaluation metrics. The empirical study suggests that MAJOR_CD have been believed to be effective in addressing the class imbalance problem

    SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary

    Get PDF
    The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered \de facto" standard in the framework of learning from imbalanced data. This is due to its simplicity in the design of the procedure, as well as its robustness when applied to di erent type of problems. Since its publication in 2002, SMOTE has proven successful in a variety of applications from several di erent domains. SMOTE has also inspired several approaches to counter the issue of class imbalance, and has also signi cantly contributed to new supervised learning paradigms, including multilabel classi cation, incremental learning, semi-supervised learning, multi-instance learning, among others. It is standard benchmark for learning from imbalanced data. It is also featured in a number of di erent software packages | from open source to commercial. In this paper, marking the fteen year anniversary of SMOTE, we re ect on the SMOTE journey, discuss the current state of a airs with SMOTE, its applications, and also identify the next set of challenges to extend SMOTE for Big Data problems.This work have been partially supported by the Spanish Ministry of Science and Technology under projects TIN2014-57251-P, TIN2015-68454-R and TIN2017-89517-P; the Project 887 BigDaP-TOOLS - Ayudas Fundaci on BBVA a Equipos de Investigaci on Cient ca 2016; and the National Science Foundation (NSF) Grant IIS-1447795

    Prediction in Financial Markets: The Case for Small Disjuncts

    Get PDF
    Predictive models in regression and classification problems typically have a single model that covers most, if not all, cases in the data. At the opposite end of the spectrum is a collection of models each of which covers a very small subset of the decision space. These are referred to as “small disjuncts.” The tradeoffs between the two types of models have been well documented. Single models, especially linear ones, are easy to interpret and explain. In contrast, small disjuncts do not provide as clean or as simple an interpretation of the data, and have been shown by several researchers to be responsible for a disproportionately large number of errors when applied to out of sample data. This research provides a counterpoint, demonstrating that “simple” small disjuncts provide a credible model for financial market prediction, a problem with a high degree of noise. A related novel contribution of this paper is a simple method for measuring the “yield” of a learning system, which is the percentage of in sample performance that the learned model can be expected to realize on out-of-sample data. Curiously, such a measure is missing from the literature on regression learning algorithms.NYU Stern School of Busines

    An insight into imbalanced Big Data classification: outcomes and challenges

    Get PDF
    Big Data applications are emerging during the last years, and researchers from many disciplines are aware of the high advantages related to the knowledge extraction from this type of problem. However, traditional learning approaches cannot be directly applied due to scalability issues. To overcome this issue, the MapReduce framework has arisen as a “de facto” solution. Basically, it carries out a “divide-and-conquer” distributed procedure in a fault-tolerant way to adapt for commodity hardware. Being still a recent discipline, few research has been conducted on imbalanced classification for Big Data. The reasons behind this are mainly the difficulties in adapting standard techniques to the MapReduce programming style. Additionally, inner problems of imbalanced data, namely lack of data and small disjuncts, are accentuated during the data partitioning to fit the MapReduce programming style. This paper is designed under three main pillars. First, to present the first outcomes for imbalanced classification in Big Data problems, introducing the current research state of this area. Second, to analyze the behavior of standard pre-processing techniques in this particular framework. Finally, taking into account the experimental results obtained throughout this work, we will carry out a discussion on the challenges and future directions for the topic.This work has been partially supported by the Spanish Ministry of Science and Technology under Projects TIN2014-57251-P and TIN2015-68454-R, the Andalusian Research Plan P11-TIC-7765, the Foundation BBVA Project 75/2016 BigDaPTOOLS, and the National Science Foundation (NSF) Grant IIS-1447795

    Intelligent and Improved Self-Adaptive Anomaly based Intrusion Detection System for Networks

    Get PDF
    With the advent of digital technology, computer networks have developed rapidly at an unprecedented pace contributing tremendously to social and economic development. They have become the backbone for all critical sectors and all the top Multi-National companies. Unfortunately, security threats for computer networks have increased dramatically over the last decade being much brazen and bolder. Intrusions or attacks on computers and networks are activities or attempts to jeopardize main system security objectives, which called as confidentiality, integrity and availability. They lead mostly in great financial losses, massive sensitive data leaks, thereby decreasing efficiency and the quality of productivity of an organization. There is a great need for an effective Network Intrusion Detection System (NIDS), which are security tools designed to interpret the intrusion attempts in incoming network traffic, thereby achieving a solid line of protection against inside and outside intruders. In this work, we propose to optimize a very popular soft computing tool prevalently used for intrusion detection namely Back Propagation Neural Network (BPNN) using a novel machine learning framework called “ISAGASAA”, based on Improved Self-Adaptive Genetic Algorithm (ISAGA) and Simulated Annealing Algorithm (SAA). ISAGA is our variant of standard Genetic Algorithm (GA), which is developed based on GA improved through an Adaptive Mutation Algorithm (AMA) and optimization strategies. The optimization strategies carried out are Parallel Processing (PP) and Fitness Value Hashing (FVH) that reduce execution time, convergence time and save processing power. While, SAA was incorporated to ISAGA in order to optimize its heuristic search. Experimental results based on Kyoto University benchmark dataset version 2015 demonstrate that our optimized NIDS based BPNN called “ANID BPNN-ISAGASAA” outperforms several state-of-art approaches in terms of detection rate and false positive rate. Moreover, improvement of GA through FVH and PP saves processing power and execution time. Thus, our model is very much convenient for network anomaly detection.

    Evolutionary Learning of Hierarchical Decision Rules

    Get PDF
    This paper describes an approach based on evolutionary algorithms, hierarchical decision rules (HIDER), for learning rules in continuous and discrete domains. The algorithm produces a hierarchical set of rules, that is, the rules are sequentially obtained and must be, therefore, tried in order until one is found whose conditions are satisfied. Thus, the number of rules may be reduced because the rules could be inside one another. The evolutionary algorithm uses both real and binary coding for the individuals of the population. We have tested our system on real data from the UCI Repository, and the results of a ten-fold cross-validation are compared to C4.5s, C4.5Rules, See5s, and See5Rules. The experiments show that HIDER works well in practice
    corecore