7,290 research outputs found

    Evolving text classification rules with genetic programming

    Get PDF
    We describe a novel method for using genetic programming to create compact classification rules using combinations of N-grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that the rules may have a number of other uses beyond classification and provide a basis for text mining applications

    A Review of Rule Learning Based Intrusion Detection Systems and Their Prospects in Smart Grids

    Get PDF

    Partitioned Sampling of Public Opinions Based on Their Social Dynamics

    Full text link
    Public opinion polling is usually done by random sampling from the entire population, treating individual opinions as independent. In the real world, individuals' opinions are often correlated, e.g., among friends in a social network. In this paper, we explore the idea of partitioned sampling, which partitions individuals with high opinion similarities into groups and then samples every group separately to obtain an accurate estimate of the population opinion. We rigorously formulate the above idea as an optimization problem. We then show that the simple partitions which contain only one sample in each group are always better, and reduce finding the optimal simple partition to a well-studied Min-r-Partition problem. We adapt an approximation algorithm and a heuristic algorithm to solve the optimization problem. Moreover, to obtain opinion similarity efficiently, we adapt a well-known opinion evolution model to characterize social interactions, and provide an exact computation of opinion similarities based on the model. We use both synthetic and real-world datasets to demonstrate that the partitioned sampling method results in significant improvement in sampling quality and it is robust when some opinion similarities are inaccurate or even missing

    A new approach for discovering business process models from event logs.

    Get PDF
    Process mining is the automated acquisition of process models from the event logs of information systems. Although process mining has many useful applications, not all inherent difficulties have been sufficiently solved. A first difficulty is that process mining is often limited to a setting of non-supervised learnings since negative information is often not available. Moreover, state transitions in processes are often dependent on the traversed path, which limits the appropriateness of search techniques based on local information in the event log. Another difficulty is that case data and resource properties that can also influence state transitions are time-varying properties, such that they cannot be considered ascross-sectional.This article investigates the use of first-order, ILP classification learners for process mining and describes techniques for dealing with each of the above mentioned difficulties. To make process mining a supervised learning task, we propose to include negative events in the event log. When event logs contain no negative information, a technique is described to add artificial negative examples to a process log. To capture history-dependent behavior the article proposes to take advantage of the multi-relational nature of ILP classification learners. Multi-relational process mining allows to search for patterns among multiple event rows in the event log, effectively basing its search on global information. To deal with time-varying case data and resource properties, a closed-world version of the Event Calculus has to be added as background knowledge, transforming the event log effectively in a temporal database. First experiments on synthetic event logs show that first-order classification learners are capable of predicting the behavior with high accuracy, even under conditions of noise.Credit; Credit scoring; Models; Model; Applications; Performance; Space; Decision; Yield; Real life; Risk; Evaluation; Rules; Neural networks; Networks; Classification; Research; Business; Processes; Event; Information; Information systems; Systems; Learning; Data; Behavior; Patterns; IT; Event calculus; Knowledge; Database; Noise;
    corecore