53 research outputs found
Designing multiple classifier combinations a survey
Classification accuracy can be improved through multiple classifier approach. It has been proven that multiple classifier combinations can successfully obtain better classification accuracy than using a single classifier. There are two main problems in designing a multiple classifier combination which are determining the classifier ensemble and combiner construction. This paper reviews approaches in constructing the classifier ensemble and combiner. For each approach, methods have been reviewed and their advantages and disadvantages have been highlighted. A random strategy and majority voting are the most commonly used to construct the ensemble and combiner, respectively. The results presented in this
review are expected to be a road map in designing multiple classifier combinations
Applications of Mining Arabic Text: A Review
Since the appearance of text mining, the Arabic language gained some interest in applying several text mining tasks over a text written in the Arabic language. There are several challenges faced by the researchers. These tasks include Arabic text summarization, which is one of the challenging open areas for research in natural language processing (NLP) and text mining fields, Arabic text categorization, and Arabic sentiment analysis. This chapter reviews some of the past and current researches and trends in these areas and some future challenges that need to be tackled. It also presents some case studies for two of the reviewed approaches
Water filtration by using apple and banana peels as activated carbon
Water filter is an important devices for reducing the contaminants in raw water. Activated from charcoal is used to absorb the contaminants. Fruit peels are some of the suitable alternative carbon to substitute the charcoal. Determining the role of fruit peels which were apple and banana peels powder as activated carbon in water filter is the main goal. Drying and blending the peels till they become powder is the way to allow them to absorb the contaminants. Comparing the results for raw water before and after filtering is the observation. After filtering the raw water, the reading for pH was 6.8 which is in normal pH and turbidity reading recorded was 658 NTU. As for the colour, the water becomes more clear compared to the raw water. This study has found that fruit peels such as banana and apple are an effective substitute to charcoal as natural absorbent
Multiple proportion case-basing driven CBRE and its application in the evaluation of possible failure of firms
Case-based reasoning (CBR) is a unique tool for the evaluation of possible failure of firms (EOPFOF) for its eases of interpretation and implementation. Ensemble computing, a variation of group decision in society, provides a potential means of improving predictive performance of CBR-based EOPFOF. This research aims to integrate bagging and proportion case-basing with CBR to generate a method of proportion bagging CBR for EOPFOF. Diverse multiple case bases are first produced by multiple case-basing, in which a volume parameter is introduced to control the size of each case base. Then, the classic case retrieval algorithm is implemented to generate diverse member CBR predictors. Majority voting, the most frequently used mechanism in ensemble computing, is finally used to aggregate outputs of member CBR predictors in order to produce final prediction of the CBR ensemble. In an empirical experiment, we statistically validated the results of the CBR ensemble from multiple case bases by comparing them with those of multivariate discriminant analysis, logistic regression, classic CBR, the best member CBR predictor and bagging CBR ensemble. The results from Chinese EOPFOF prior to 3 years indicate that the new CBR ensemble, which significantly improved CBRs predictive ability, outperformed all the comparative methods
Text Data Mining: Theory and Methods
This paper provides the reader with a very brief introduction to some of the
theory and methods of text data mining. The intent of this article is to
introduce the reader to some of the current methodologies that are employed
within this discipline area while at the same time making the reader aware of
some of the interesting challenges that remain to be solved within the area.
Finally, the articles serves as a very rudimentary tutorial on some of
techniques while also providing the reader with a list of references for
additional study.Comment: Published in at http://dx.doi.org/10.1214/07-SS016 the Statistics
Surveys (http://www.i-journals.org/ss/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Rough set based ensemble classifier for web page classification
Combining the results of a number of individually trained classification systems to obtain a more accurate classifier is a widely used technique in pattern recognition. In this article, we have introduced a rough set based meta classifier to classify web pages. The proposed method consists of two parts. In the first part, the output of every individual classifier is considered for constructing a decision table. In the second part, rough set attribute reduction and rule generation processes are used on the decision table to construct a meta classifier. It has been shown that (1) the performance of the meta classifier is better than the performance of every constituent classifier and, (2) the meta classifier is optimal with respect to a quality measure defined in the article. Experimental studies show that the meta classifier improves accuracy of classification uniformly over some benchmark corpora and beats other ensemble approaches in accuracy by a decisive margin, thus demonstrating the theoretical results. Apart from this, it reduces the CPU load compared to other ensemble classification techniques by removing redundant classifiers from the combination
Delineating Knowledge Domains in Scientific Domains in Scientific Literature using Machine Learning (ML)
The recent years have witnessed an upsurge in the number of published documents. Organizations are showing an increased interest in text classification for effective use of the information. Manual procedures for text classification can be fruitful for a handful of documents, but the same lack in credibility when the number of documents increases besides being laborious and time-consuming. Text mining techniques facilitate assigning text strings to categories rendering the process of classification fast, accurate, and hence reliable. This paper classifies chemistry documents using machine learning and statistical methods. The procedure of text classification has been described in chronological order like data preparation followed by processing, transformation, and application of classification techniques culminating in the validation of the results
Arabic Text Mining
The rapid growth of the internet has increased the number of online texts.
This led to the rapid growth of the number of online texts in the Arabic
language. The enormous amount of text must be organized into classes to make
the analysis process and text retrieval easier. Text classification is,
therefore, a key component of text mining. There are numerous systems and
approaches for categorizing literature in English, European (French, German,
Spanish), and Asian (Chinese, Japanese). In contrast, there are relatively few
studies on categorizing Arabic literature due to the difficulty of the Arabic
language. In this work, a brief explanation of key ideas relevant to Arabic
text mining are introduced then a new classification system for the Arabic
language is presented using light stemming and Classifier Na\"ive Bayesian
(CNB). Texts from two classes: politics and sports, are included in our corpus.
Some texts are added to the system, and the system correctly classified them,
demonstrating the effectiveness of the system
- …