6 research outputs found

    I-prune: Item selection for associative classification

    Get PDF
    Associative classification is characterized by accurate models and high model generation time. Most time is spent in extracting and postprocessing a large set of irrelevant rules, which are eventually pruned.We propose I-prune, an item-pruning approach that selects uninteresting items by means of an interestingness measure and prunes them as soon as they are detected. Thus, the number of extracted rules is reduced and model generation time decreases correspondingly. A wide set of experiments on real and synthetic data sets has been performed to evaluate I-prune and select the appropriate interestingness measure. The experimental results show that I-prune allows a significant reduction in model generation time, while increasing (or at worst preserving) model accuracy. Experimental evaluation also points to the chi-square measure as the most effective interestingness measure for item pruning

    Classification techniques on computerized systems to predict and/or to detect Apnea: A systematic review

    Get PDF
    Sleep apnea syndrome (SAS), which can significantly decrease the quality of life is associated with a major risk factor of health implications such as increased cardiovascular disease, sudden death, depression, irritability, hypertension, and learning difficulties. Thus, it is relevant and timely to present a systematic review describing significant applications in the framework of computational intelligence-based SAS, including its performance, beneficial and challenging effects, and modeling for the decision-making on multiple scenarios.info:eu-repo/semantics/publishedVersio

    Classification Based on both Attribute Value Weight and Tuple Weight under the Cloud Computing

    Get PDF
    In recent years, more and more people pay attention to cloud computing. Users need to deal with magnanimity data in the cloud computing environment. Classification can predict the need of users from large data in the cloud computing environment. Some traditional classification methods frequently adopt the following two ways. One way is to remove instance after it is covered by a rule, another way is to decrease tuple weight of instance after it is covered by a rule. The quality of these traditional classifiers may be not high. As a result, they cannot achieve high classification accuracy in some data. In this paper, we present a new classification approach, called classification based on both attribute value weight and tuple weight (CATW). CATW is distinguished from some traditional classifiers in two aspects. First, CATW uses both attribute value weight and tuple weight. Second, CATW proposes a new measure to select best attribute values and generate high quality classification rule set. Our experimental results indicate that CATW can achieve higher classification accuracy than some traditional classifiers

    EXPLORING IMPACT OF EDUCATIONAL AND ECONOMIC FACTORS ON NATIONAL INTELLECTUAL PRODUCTIVITY USING MACHINE LEARNING METHODS

    Get PDF
    The patent process is representative of a nationwide means for innovations and new ideas to be recognized. The U.S. Patents Office, since its inception in 1790, has issued nearly five million patents. These patents span from the U.S. Patent #1, which was for an improvement in the making of Pot ash and Pearl ash by a new Apparatus and Process to today\u27s patents which deal with technologies and mediums that were unimaginable at the Patent Offices\u27 inception. The purpose of this study is to determine what social and economic factors at the federal level have the highest impact on national productivity measured by the number of patents applied for and/or granted each year. Using Machine Learning algorithms and predictive analysis on fifty years worth of data to determine what macroeconomic and educational factors have the most impact on patents. The first part of this study describes the methods and algorithms used during this research. The second part of this study discusses the results and what those results reveal about the impact of education and economic factors as they relate to national creativity / intellectual productivity. The goal of this study is to determine what factors affect national intellectual productivity in a given year. This data will be useful for governments, both local and federal, when faced with educational and economic issues

    The effect of threshold values on association rule based classification accuracy

    No full text
    Classification Association Rule Mining (CARM) systems operate by applying an Association Rule Mining (ARM) method to obtain classification rules from a train-ing set of previously-classified data. The rules thus generated will be influenced by the choice of ARM parameters employed by the algorithm (typically support and confidence threshold values). In this paper we examine the effect that this choice has on the predictive accuracy of CARM methods. We show that the accuracy can almost always be improved by a suitable choice of parameters, and describe a hill-climbing method for finding the best parameter settings. We also demonstrate that the proposed hill-climbing method is most effective when coupled with a fast CARM algorithm such as the TFPC algorithm which is also described

    Language-independent pre-processing of large document bases for text classification

    Get PDF
    Text classification is a well-known topic in the research of knowledge discovery in databases. Algorithms for text classification generally involve two stages. The first is concerned with identification of textual features (i.e. words andlor phrases) that may be relevant to the classification process. The second is concerned with classification rule mining and categorisation of "unseen" textual data. The first stage is the subject of this thesis and often involves an analysis of text that is both language-specific (and possibly domain-specific), and that may also be computationally costly especially when dealing with large datasets. Existing approaches to this stage are not, therefore, generally applicable to all languages. In this thesis, we examine a number of alternative keyword selection methods and phrase generation strategies, coupled with two potential significant word list construction mechanisms and two final significant word selection mechanisms, to identify such words andlor phrases in a given textual dataset that are expected to serve to distinguish between classes, by simple, language-independent statistical properties. We present experimental results, using common (large) textual datasets presented in two distinct languages, to show that the proposed approaches can produce good performance with respect to both classification accuracy and processing efficiency. In other words, the study presented in this thesis demonstrates the possibility of efficiently solving the traditional text classification problem in a language-independent (also domain-independent) manner
    corecore