8,485 research outputs found
A Nonparametric Ensemble Binary Classifier and its Statistical Properties
In this work, we propose an ensemble of classification trees (CT) and
artificial neural networks (ANN). Several statistical properties including
universal consistency and upper bound of an important parameter of the proposed
classifier are shown. Numerical evidence is also provided using various real
life data sets to assess the performance of the model. Our proposed
nonparametric ensemble classifier doesn't suffer from the `curse of
dimensionality' and can be used in a wide variety of feature selection cum
classification problems. Performance of the proposed model is quite better when
compared to many other state-of-the-art models used for similar situations
RESEARCH ISSUES CONCERNING ALGORITHMS USED FOR OPTIMIZING THE DATA MINING PROCESS
In this paper, we depict some of the most widely used data mining algorithms that have an overwhelming utility and influence in the research community. A data mining algorithm can be regarded as a tool that creates a data mining model. After analyzing a set of data, an algorithm searches for specific trends and patterns, then defines the parameters of the mining model based on the results of this analysis. The above defined parameters play a significant role in identifying and extracting actionable patterns and detailed statistics. The most important algorithms within this research refer to topics like clustering, classification, association analysis, statistical learning, link mining. In the following, after a brief description of each algorithm, we analyze its application potential and research issues concerning the optimization of the data mining process. After the presentation of the data mining algorithms, we will depict the most important data mining algorithms included in Microsoft and Oracle software products, useful suggestions and criteria in choosing the most recommended algorithm for solving a mentioned task, advantages offered by these software products.data mining optimization, data mining algorithms, software solutions
EC3: Combining Clustering and Classification for Ensemble Learning
Classification and clustering algorithms have been proved to be successful
individually in different contexts. Both of them have their own advantages and
limitations. For instance, although classification algorithms are more powerful
than clustering methods in predicting class labels of objects, they do not
perform well when there is a lack of sufficient manually labeled reliable data.
On the other hand, although clustering algorithms do not produce label
information for objects, they provide supplementary constraints (e.g., if two
objects are clustered together, it is more likely that the same label is
assigned to both of them) that one can leverage for label prediction of a set
of unknown objects. Therefore, systematic utilization of both these types of
algorithms together can lead to better prediction performance. In this paper,
We propose a novel algorithm, called EC3 that merges classification and
clustering together in order to support both binary and multi-class
classification. EC3 is based on a principled combination of multiple
classification and multiple clustering methods using an optimization function.
We theoretically show the convexity and optimality of the problem and solve it
by block coordinate descent method. We additionally propose iEC3, a variant of
EC3 that handles imbalanced training data. We perform an extensive experimental
analysis by comparing EC3 and iEC3 with 14 baseline methods (7 well-known
standalone classifiers, 5 ensemble classifiers, and 2 existing methods that
merge classification and clustering) on 13 standard benchmark datasets. We show
that our methods outperform other baselines for every single dataset, achieving
at most 10% higher AUC. Moreover our methods are faster (1.21 times faster than
the best baseline), more resilient to noise and class imbalance than the best
baseline method.Comment: 14 pages, 7 figures, 11 table
- ā¦