6 research outputs found
I-prune: Item selection for associative classification
Associative classification is characterized by accurate models and high model generation time. Most time is spent in extracting and postprocessing a large set of irrelevant rules, which are eventually pruned.We propose I-prune, an item-pruning approach that selects uninteresting items by means of an interestingness measure and prunes them as soon as they are detected. Thus, the number of extracted rules is reduced and model generation time decreases correspondingly. A wide set of experiments on real and synthetic data sets has been performed to evaluate I-prune and select the appropriate interestingness measure. The experimental results show that I-prune allows a significant reduction in model generation time, while increasing (or at worst preserving) model accuracy. Experimental evaluation also points to the chi-square measure as the most effective interestingness measure for item pruning
Classification techniques on computerized systems to predict and/or to detect Apnea: A systematic review
Sleep apnea syndrome (SAS), which can significantly decrease the quality of life is associated with a major risk factor of health implications such as increased cardiovascular disease, sudden death, depression, irritability, hypertension, and learning difficulties. Thus, it is relevant and timely to present a systematic review describing significant applications in the framework of computational intelligence-based SAS, including its performance, beneficial and challenging effects, and modeling for the decision-making on multiple scenarios.info:eu-repo/semantics/publishedVersio
Classification Based on both Attribute Value Weight and Tuple Weight under the Cloud Computing
In recent years, more and more people pay attention to cloud computing. Users need to deal with magnanimity data in the cloud computing environment. Classification can predict the need of users from large data in the cloud computing environment. Some traditional classification methods frequently adopt the following two ways. One way is to remove instance after it is covered by a rule, another way is to decrease tuple weight of instance after it is covered by a rule. The quality of these traditional classifiers may be not high. As a result, they cannot achieve high classification accuracy in some data. In this paper, we present a new classification approach, called classification based on both attribute value weight and tuple weight (CATW). CATW is distinguished from some traditional classifiers in two aspects. First, CATW uses both attribute value weight and tuple weight. Second, CATW proposes a new measure to select best attribute values and generate high quality classification rule set. Our experimental results indicate that CATW can achieve higher classification accuracy than some traditional classifiers
EXPLORING IMPACT OF EDUCATIONAL AND ECONOMIC FACTORS ON NATIONAL INTELLECTUAL PRODUCTIVITY USING MACHINE LEARNING METHODS
The patent process is representative of a nationwide means for innovations and new ideas to be recognized. The U.S. Patents Office, since its inception in 1790, has issued nearly five million patents. These patents span from the U.S. Patent #1, which was for an improvement in the making of Pot ash and Pearl ash by a new Apparatus and Process to today\u27s patents which deal with technologies and mediums that were unimaginable at the Patent Offices\u27 inception. The purpose of this study is to determine what social and economic factors at the federal level have the highest impact on national productivity measured by the number of patents applied for and/or granted each year. Using Machine Learning algorithms and predictive analysis on fifty years worth of data to determine what macroeconomic and educational factors have the most impact on patents. The first part of this study describes the methods and algorithms used during this research. The second part of this study discusses the results and what those results reveal about the impact of education and economic factors as they relate to national creativity / intellectual productivity. The goal of this study is to determine what factors affect national intellectual productivity in a given year. This data will be useful for governments, both local and federal, when faced with educational and economic issues
The effect of threshold values on association rule based classification accuracy
Classification Association Rule Mining (CARM) systems operate by applying an Association Rule Mining (ARM) method to obtain classification rules from a train-ing set of previously-classified data. The rules thus generated will be influenced by the choice of ARM parameters employed by the algorithm (typically support and confidence threshold values). In this paper we examine the effect that this choice has on the predictive accuracy of CARM methods. We show that the accuracy can almost always be improved by a suitable choice of parameters, and describe a hill-climbing method for finding the best parameter settings. We also demonstrate that the proposed hill-climbing method is most effective when coupled with a fast CARM algorithm such as the TFPC algorithm which is also described
Language-independent pre-processing of large document bases for text classification
Text classification is a well-known topic in the research of knowledge discovery in
databases. Algorithms for text classification generally involve two stages. The first
is concerned with identification of textual features (i.e. words andlor phrases) that
may be relevant to the classification process. The second is concerned with
classification rule mining and categorisation of "unseen" textual data. The first
stage is the subject of this thesis and often involves an analysis of text that is both
language-specific (and possibly domain-specific), and that may also be
computationally costly especially when dealing with large datasets. Existing
approaches to this stage are not, therefore, generally applicable to all languages. In
this thesis, we examine a number of alternative keyword selection methods and
phrase generation strategies, coupled with two potential significant word list
construction mechanisms and two final significant word selection mechanisms, to
identify such words andlor phrases in a given textual dataset that are expected to
serve to distinguish between classes, by simple, language-independent statistical
properties. We present experimental results, using common (large) textual datasets
presented in two distinct languages, to show that the proposed approaches can
produce good performance with respect to both classification accuracy and
processing efficiency. In other words, the study presented in this thesis
demonstrates the possibility of efficiently solving the traditional text classification
problem in a language-independent (also domain-independent) manner