3 research outputs found

    SentiMLBench: Benchmark Evaluation of Machine Learning Algorithms for Sentiment Analysis

    Get PDF
    Sentiment Analysis has been a topic of interest for researchers due to its increasing usage by Industry. To measure end-user sentiment., there is no clear verdict on which algorithms are better in real-time scenarios. A rigorous benchmark evaluation of various algorithms running across multiple datasets and different hardware architectures is required that can guide future researchers on potential advantages and limitations. In this paper, proposed SentiMLBench is a critical evaluation of key ML algorithms as standalone classifiers, a novel cascade feature selection (CFS) based ensemble technique in multiple benchmark environments each using a different twitter dataset and processing hardware. The best trained ensemble model with CFS enhancement surpasses current state-of-the-art models, according to experimental results. In a study, though ensemble model provides good accuracy, it falls short of neural networks accuracy by 2%. ML algorithms accuracy is poor as standalone classifiers across all three studies. The supremacy of neural networks is further stamped in study three where it outperforms other algorithms in accuracy by over 10%. Graphical processing unit provide speed and higher computational power at a fraction of a cost compared to a normal processor thereby providing critical architectural insights into developing a robust expert system for sentiment analysis

    The role of approximate negators in modeling the automatic detection of negation in tweets

    Get PDF
    Although improvements have been made in the performance of sentiment analysis tools, the automatic detection of negated text (which affects negative sentiment prediction) still presents challenges. More research is needed on new forms of negation beyond prototypical negation cues such as “not” or “never.” The present research reports findings on the role of a set of words called “approximate negators,” namely “barely,” “hardly,” “rarely,” “scarcely,” and “seldom,” which, in specific occasions (such as attached to a word from the non-affirmative adverb “any” family), can operationalize negation styles not yet explored. Using a corpus of 6,500 tweets, human annotation allowed for the identification of 17 recurrent usages of these words as negatives (such as “very seldom”) which, along with findings from the literature, helped engineer specific features that guided a machine learning classifier in predicting negated tweets. The machine learning experiments also modeled negation scope (i.e. in which specific words are negated in the text) by employing lexical and dependency graph information. Promising results included F1 values for negation detection ranging from 0.71 to 0.89 and scope detection from 0.79 to 0.88. Future work will be directed to the application of these findings in automatic sentiment classification, further exploration of patterns in data (such as part-of-speech recurrences for these new types of negation), and the investigation of sarcasm, formal language, and exaggeration as themes that emerged from observations during corpus annotation
    corecore