5 research outputs found

    Machine learning classifiers: Evaluation of the performance in online reviews

    Get PDF
    This paper aims to evaluate the performance of the machine learning classifiers and identify the most suitable classifier for classifying sentiment value. The term “sentiment value” in this study is referring to the polarity (positive, negative or neutral) of the text. This work applies machine learning classifiers from WEKA (Waikato Environment for Knowledge Analysis) toolkit in order to perform their evaluation. WEKA toolkit is a great set of tools for data mining and classification. The performance of the machine learning classifiers was measured by examining overall accuracy, recall, precision, kappa statistic and applying few visualization techniques. Finally, the analysis is applied to find the most suitable classifier for classifying sentiment value. Results show that two classifiers from Rules and Trees categories of classifiers perform equally best comparing to the other classifiers from categories, such as Bayes, Functions, Lazy and Meta. This paper explores the performance of machine learning classifiers in sentiment value classification in the online reviews. Data used is never been used before to explore the performance of machine learning classifiers

    Intelligent classification of ammonia concentration based on odor profile

    Get PDF
    This thesis presents the intelligent classification of ammonia concentration based on the standard of oil and gas industries wastewater discharge. The intelligent classification using signal processing is a well-known technique in many applications and as well in the oil and gas industry. The intelligent classification technique for ammonia concentration classification is a demanding technique especially in the environmental sector. Ammonia solution properties and ammonia solution preparations were studied in this thesis which commonly used in industry. The objectives of this thesis are to develop an intelligence classification of ammonia concentration based on the oil and gas industry wastewater discharge schedule and to analyze performance of the intelligent classification of ammonia concentration based on the oil and gas industry wastewater discharge schedule. In this thesis the ammonia odor profile has been pre-identified by chemist using four sensor array. The ammonia concentration was validated using a commercialized gas sensor and spectrophotometer to cross-validated e-nose instrument. The odor profile from two different samples; high (20 ppm and 25 ppm) and low (5 ppm, 10 ppm and 1 5ppm) concentration that have been normalized and visualized in a 2D plot to extract the unique patterns. The variance of the low and high concentration of ammonia odor profile has been identified as different group samples. This group samples have been analyzed statistically using Boxplot, calibration curve and proximity matrix, The thesis describes the statistical techniques to visualize the pattern and using mean features to classify between the low and high concentration. Two intelligent classification techniques have been used which are Artificial Neural Network (ANN) using the back-propagation approaches and then, the result of ANN model was cross-validated.using CBR. Both ANN model and CBR classifier have been measured using several performance measures. From the results, it is observed that ANN model and CBR classifier are capable of classifying 100% of ammonia concentration odor profile from the water. The results can also significantly reduce the cost and time, and improve product reliability and customer confidence

    Machine Learning Classifiers: Evaluation of the Performance in Online Reviews

    Full text link

    Kernel Methods and Measures for Classification with Transparency, Interpretability and Accuracy in Health Care

    Get PDF
    Support vector machines are a popular method in machine learning. They learn from data about a subject, for example, lung tumors in a set of patients, to classify new data, such as, a new patient’s tumor. The new tumor is classified as either cancerous or benign, depending on how similar it is to the tumors of other patients in those two classes—where similarity is judged by a kernel. The adoption and use of support vector machines in health care, however, is inhibited by a perceived and actual lack of rationale, understanding and transparency for how they work and how to interpret information and results from them. For example, a user must select the kernel, or similarity function, to be used, and there are many kernels to choose from but little to no useful guidance on choosing one. The primary goal of this thesis is to create accurate, transparent and interpretable kernels with rationale to select them for classification in health care using SVM—and to do so within a theoretical framework that advances rationale, understanding and transparency for kernel/model selection with atomic data types. The kernels and framework necessarily co-exist. The secondary goal of this thesis is to quantitatively measure model interpretability for kernel/model selection and identify the types of interpretable information which are available from different models for interpretation. Testing my framework and transparent kernels with empirical data I achieve classification accuracy that is better than or equivalent to the Gaussian RBF kernels. I also validate some of the model interpretability measures I propose