3 research outputs found

    Supervised Learning Techniques for Classification Of Students’ Tweets

    Get PDF
    In today’s era, up-to-date information can be retrieved from social network, internet community and data forums. People especially the younger generation share their feelings, happiness, experience and also day to day happenings in the social media platforms like Twitter. There exists large volume of unstructured data in it. The proposed system concentrates on the learning process of the engineering students and the problems faced by them during their study from their twitter posts. Since the data collected is huge, Apache hadoop map reduce environment is used for processing. The system includes pre-processing of tweets, calculating F1 measure, identifying prominent categories, identifying word and category probability and finally classifies tweets to the respective categories. The supervised learning techniques such as multiclass SVM based Platt Scaling, Naïve Bayes and logistic regression are used to identify heavy study load, lack of social engagement and sleep problems. Comparing the results attained, SVM achieves an accuracy score of 84% which is 5 to 10 percent higher than Logistic Regression and Naïve Bayesian method

    Supervised Learning Techniques for Classification Of Students’ Tweets

    Get PDF
    In today’s era, up-to-date information can be retrieved from social network, internet community and data forums. People especially the younger generation share their feelings, happiness, experience and also day to day happenings in the social media platforms like Twitter. There exists large volume of unstructured data in it. The proposed system concentrates on the learning process of the engineering students and the problems faced by them during their study from their twitter posts. Since the data collected is huge, Apache hadoop map reduce environment is used for processing. The system includes pre-processing of tweets, calculating F1 measure, identifying prominent categories, identifying word and category probability and finally classifies tweets to the respective categories. The supervised learning techniques such as multiclass SVM based Platt Scaling, Naïve Bayes and logistic regression are used to identify heavy study load, lack of social engagement and sleep problems. Comparing the results attained, SVM achieves an accuracy score of 84% which is 5 to 10 percent higher than Logistic Regression and Naïve Bayesian metho

    Tree Based Boosting Algorithm to Tackle the Overfitting in Healthcare Data

    Get PDF
    Healthcare data refers to information about an individual's or population's health issues, reproductive results, causes of mortality, and quality of life. When people interact with healthcare systems, a variety of health data is collected and used. However, these healthcare data are noisy as well as it prone to over-fitting. Over-fitting is a modeling error in statistics that occurs when a function is too closely aligned to a limited set of data points. As a result, the model learns the information and noise in the training data to the point where it degrades the model's performance on fresh data. The tree-based boosting approach works well on over-fitted data and is well suited for healthcare data. Improved Paloboost performs trimmed gradient and updated learning rate using Out-of-Bag mistakes collected from Out-of-Bag data. Out-of-Bag data are the data that are not present in In-Bag data. Improved Paloboost's outcome will protect against over-fitting in noisy healthcare data and outperform all tree baseline models. The Improved Paloboost is better at avoiding over-fitting of data and is less sensitive, according to experimental results on health-care datasets
    corecore