782 research outputs found

    Unleashing the Power of Hashtags in Tweet Analytics with Distributed Framework on Apache Storm

    Full text link
    Twitter is a popular social network platform where users can interact and post texts of up to 280 characters called tweets. Hashtags, hyperlinked words in tweets, have increasingly become crucial for tweet retrieval and search. Using hashtags for tweet topic classification is a challenging problem because of context dependent among words, slangs, abbreviation and emoticons in a short tweet along with evolving use of hashtags. Since Twitter generates millions of tweets daily, tweet analytics is a fundamental problem of Big data stream that often requires a real-time Distributed processing. This paper proposes a distributed online approach to tweet topic classification with hashtags. Being implemented on Apache Storm, a distributed real time framework, our approach incrementally identifies and updates a set of strong predictors in the Na\"ive Bayes model for classifying each incoming tweet instance. Preliminary experiments show promising results with up to 97% accuracy and 37% increase in throughput on eight processors.Comment: IEEE International Conference on Big Data 201

    An Advanced Conceptual Diagnostic Healthcare Framework for Diabetes and Cardiovascular Disorders

    Full text link
    The data mining along with emerging computing techniques have astonishingly influenced the healthcare industry. Researchers have used different Data Mining and Internet of Things (IoT) for enrooting a programmed solution for diabetes and heart patients. However, still, more advanced and united solution is needed that can offer a therapeutic opinion to individual diabetic and cardio patients. Therefore, here, a smart data mining and IoT (SMDIoT) based advanced healthcare system for proficient diabetes and cardiovascular diseases have been proposed. The hybridization of data mining and IoT with other emerging computing techniques is supposed to give an effective and economical solution to diabetes and cardio patients. SMDIoT hybridized the ideas of data mining, Internet of Things, chatbots, contextual entity search (CES), bio-sensors, semantic analysis and granular computing (GC). The bio-sensors of the proposed system assist in getting the current and precise status of the concerned patients so that in case of an emergency, the needful medical assistance can be provided. The novelty lies in the hybrid framework and the adequate support of chatbots, granular computing, context entity search and semantic analysis. The practical implementation of this system is very challenging and costly. However, it appears to be more operative and economical solution for diabetes and cardio patients.Comment: 11 PAGE

    Opinion Mining Terhadap Toko Online Di Media Sosial Menggunakan Algoritma Naïve Bayes (Studi Kasus: Akun Facebook Dugal Delivry)

    Get PDF
    The Internet era has had an impact in various sectors of human life. One is the economic sector. Economic transactions change from the traditional pattern (face to face) to online. The customer does not need to ask about the condition of an item to be purchased to a close friend or family, but simply by reviewing the product from the same buyer's comments. Products that get good reviews mean good quality. However, a problem arises if the comment data is very large and will make it difficult for customers to summarize the quality. Therefore, an automatic opinion mining system is required which can directly give conclusions about the quality of a product. This research makes an opinion mining system by applying the Naïve Bayes algorithm by taking a case study of facebook account Dugal Delivry. The measurement result with confusion matrix gives precision value of 88,89%, recall 80% and accuracy equal to 85%

    Using Machine Learning in Disaster Tweets Classification

    Get PDF
    People share real-time updates on social media platforms (i.e. Twitter) when disaster occurs, this information is very valuable for disaster relief and response teams as it can alert them immediately in order to prioritize tasks. Text mining and Machine learning algorithm can scan the huge generated unstructured data on social media platforms such as Twitter, to spot such information through keywords and phrases that refers to disasters. One challenge that the algorithm might face is whether a tweet text is talking about a real disaster or uses those keywords as a metaphor, which can lead to huge mislabeling of tweets. Hence, this research aims on using Natural Language Processing (NLP) and classification models to distinguish between real and fake disaster tweets. The dataset was acquired from Kaggle website, and it contain tweets that are related to real disasters, and other tweets that refers to fake disasters. Furthermore, using RStudio software, exploratory data analysis (EDA), feature selections, and data cleaning were performed prior to the data modeling, two different training to testing split were tested. In addition, four classifiers were built, which are SVM, KNN, Naïve Bayes, and XGBoost. As a result, the best accuracies achieved with 80/20 ratio split, and with using the whole dataset rather than sampling, SVM and XGBoost performed well with accuracies of 80% and 78% respectively, while KNN suffered overfitting (99% accuracy) and Naïve Bayes performed poorly (65%)

    Multi Faceted Text Classification using Supervised Machine Learning Models

    Get PDF
    In recent year’s document management tasks (known as information retrieval) increased a lot due to availability of digital documents everywhere. The need of automatic methods for extracting document information became a prominent method for organizing information and knowledge discovery. Text Classification is one such solution, where in the natural language text is assigned to one or more predefined categories based on the content. In my research classification of text is mainly focused on sentiment label classification. The idea proposed for sentiment analysis is multi-class classification of online movie reviews. Many research papers discussed the classification of sentiment either positive or negative, but in this approach the user reviews are classified based on their sentiment to multi classes like positive, negative, neutral, very positive and very negative. This classification task would help the business to classify the user reviews same as star ratings, which are manually given by users. This paper also proposes a better classification approach with multi-tier prediction model. The goal of this research is to provide a better understanding classification for sentiment analysis by applying different preprocessing techniques and selecting suitable features like bag of words, stemming and removing stop words, POS Tagging etc. These features are adjusted to fit with some of the machine learning text classification algorithms such as Naïve Bayes, SVM, sand SGD on frameworks like WEKA, SVMLight & Scikit Learn

    BLOG INFORMATION CLASSIFICATION

    Get PDF
    nformation Classification is the categorization of the huge amount of data in an efficient and useful way. In the current scenario data is growing exponentially due to the rise of internet rich applications. One such source of information is the blogs. Blogs are web logs maintained by their authors that contain information related to a certain topic and also contain authors view about that topic. Micro blogs, on the other hands, are variations of blogs that contain smaller data as compared to blogs. Nevertheless, it also contains rich information. In this project, Twitter, a micro blogging website has been targeted to gather information on certain trending topics. The information is in the form of tweets. A tweet is a post or an update on status on the Twitter website. These tweets are extracted using Twitter Search APIs. This data is then classified into different classes based on its content. Using the classified data, features are extracted from the tweets and suggestions are given to the users based on the trending topics
    • …
    corecore