4,753 research outputs found

    Forecasting movie rating using k-nearest neighbor based collaborative filtering

    Get PDF
    Expressing reviews in the form of sentiments or ratings for item used or movie seen is the part of human habit. These reviews are easily available on different social websites. Based on interest pattern of a user, it is important to recommend him the items. Recommendation system is playing a vital role in everyone’s life as demand of recommendation for user’s interest increasing day by day. Movie recommendation system based on available ratings for a movie has become interesting part for new users. Till today, a lot many recommendation systems are designed using several machine learning algorithms. Still, sparsity problems, cold start problem, scalability, grey sheep problem are the hurdles for the recommendation systems that must be resolved using hybrid algorithms. We proposed in this paper, a movie rating system using a k-nearest neighbor (KNN-based) collaborative filtering (CF) approach. We compared user’s ratings for different movies to get top K users. Then we have used this top K set to find missing ratings by user for a movie using CF. Our proposed system when evaluated for various criteria shows promising results for movie recommendations compared with existing systems

    IMPROVING CORONARY HEART DISEASE PREDICTION BY OUTLIER ELIMINATION

    Get PDF
    Nowadays, heart disease is the major cause of deaths globally. According to a survey conducted by the World Health Organization, almost 18 million people die of heart diseases (or cardiovascular diseases) every day. So, there should be a system for early detection and prevention of heart disease. Detection of heart disease mostly depends on the huge pathological and clinical data that is quite complex. So, researchers and other medical professionals are showing keen interest in accurate prediction of heart disease.  Heart disease is a general term for a large number of medical conditions related to heart and one of them is the coronary heart disease (CHD). Coronary heart disease is caused by the amassing of plaque on the artery walls. In this paper, various machine learning base and ensemble classifiers have been applied on heart disease dataset for efficient prediction of coronary heart disease. Various machine learning classifiers that have been employed include k-nearest neighbor, multilayer perceptron, multinomial naïve bayes, logistic regression, decision tree, random forest and support vector machine classifiers. Ensemble classifiers that have been used include majority voting, weighted average, bagging and boosting classifiers. The dataset used in this study is obtained from the Framingham Heart Study which is a long-term, ongoing cardiovascular study of people from the Framingham city in Massachusetts, USA. To evaluate the performance of the classifiers, various evaluation metrics including accuracy, precision, recall and f1 score have been used. According to our results, the best accuracy was achieved by logistic regression, random forest, majority voting, weighted average and bagging classifiers but the highest accuracy among these was achieved using weighted average ensemble classifier.&nbsp

    The Effect of Using Data Pre-Processing by Imputations in Handling Missing Values

    Get PDF
    The evolution of big data analytics through machine learning and artificial intelligence techniques has caused organizations in a wide range of sectors including health, manufacturing, e-commerce, governance, and social welfare to realize the value of massive volumes of data accumulating on web-based repositories daily. This has led to the adoption of data-driven decision models; for example, through sentiment analysis in marketing where produces leverage customer feedback and reviews to develop customer-oriented products. However, the data generated in real-world activities is subject to errors resulting from inaccurate measurements or fault input devices, which may result in the loss of some values. Missing attribute/variable values make data unsuitable for decision analytics due to noises and inconsistencies that create bias. The objective of this paper was to explore the problem of missing data and develop an advanced imputation model based on Machine Learning and implemented on K-Nearest Neighbor (KNN) algorithm in R programming language as an approach to handle missing values. The methodology used in this paper relied on the applying advanced machine learning algorithms with high-level accuracy in pattern detection and predictive analytics on the existing imputation techniques, which handle missing values by random replacement or deletion..  According to the results, advanced imputation technique based on machine learning models replaced missing values from a dataset with 89.5% accuracy. The experimental results showed that pre-processing by imputation delivers high-level performance efficiency in handling missing data values. These findings are consistent with the key idea of paper, which is to explore alternative imputation techniques for handling missing values to improve the accuracy and reliability of decision insights extracted from datasets

    DIAGNOSE EYES DISEASES USING VARIOUS FEATURES EXTRACTION APPROACHES AND MACHINE LEARNING ALGORITHMS

    Get PDF
    Ophthalmic diseases like glaucoma, diabetic retinopathy, and cataracts are the main cause of visual impairment worldwide. With the use of the fundus images, it could be difficult for a clinician to detect eye diseases early enough. By other hand, the diagnoses of eye disease are prone to errors, challenging and labor-intensive. Thus, for the purpose of identifying various eye problems with the use of the fundus images, a system of automated ocular disease detection with computer-assisted tools is needed. Due to machine learning (ML) algorithms' advanced skills for image classification, this kind of system is feasible. An essential area of artificial intelligence)AI (is machine learning. Ophthalmologists will soon be able to deliver accurate diagnoses and support individualized healthcare thanks to the general capacity of machine learning to automatically identify, find, and grade pathological aspects in ocular disorders. This work presents a ML-based method for targeted ocular detection. The Ocular Disease Intelligent Recognition (ODIR) dataset, which includes 5,000 images of 8 different fundus types, was classified using machine learning methods. Various ocular diseases are represented by these classes. In this study, the dataset was divided into 70% training data and 30% test data, and preprocessing operations were performed on all images starting from color image conversion to grayscale, histogram equalization, BLUR, and resizing operation. The feature extraction represents the next phase in this study ,two algorithms are applied to perform the extraction of features which includes: SIFT(Scale-invariant feature transform) and GLCM(Gray Level Co-occurrence Matrix), ODIR dataset is then subjected to the classification techniques Naïve Bayes, Decision Tree, Random Forest, and K-nearest Neighbor. This study achieved the highest accuracy for binary classification (abnormal and normal) which is 75% (NB algorithm), 62% (RF algorithm), 53% (KNN algorithm), 51% (DT algorithm) and achieved the highest accuracy for multiclass classification (types of eye diseases) which is 88% (RF algorithm), 61% (KNN algorithm) 42% (NB algorithm), and 39% (DT algorithm)

    Prediction of Oestrus in Dairy Cows: An Application of Machine Learning to Skewed Data

    Get PDF
    The Dairy industry requires accurate detection of oestrus(heat) in dairy cows to maximise output of the animals. Traditionally this is a process dependant on human observation and interpretation of the various signs of heat. Many areas of the dairy industry can be automated, however the detection of oestrus is an area that still requires human experts. This thesis investigates the application of Machine Learning classification techniques, on dairy cow milking data provided by the Livestock Improvement Corporation, to predict oestrus. The usefulness of various ensemble learning algorithms such as Bagging and Boosting are explored as well as specific skewed data techniques. An empirical study into the effectiveness of classifiers designed to target skewed data is included as a significant part of the investigation. Roughly Balanced Bagging and the novel Under Bagging classifiers are explored in considerable detail and found to perform quite favourably over the SMOTE technique for the datasets selected. This study uses non-dairy, commonplace, Machine Learning datasets; many of which are found in the UCI Machine Learning Repository
    corecore