109 research outputs found

    A Systematic Literature Review on Cyberbullying in Social Media: Taxonomy, Detection Approaches, Datasets, And Future Research Directions

    Get PDF
    In the area of Natural Language Processing, sentiment analysis, also called opinion mining, aims to extract human thoughts, beliefs, and perceptions from unstructured texts. In the light of social media's rapid growth and the influx of individual comments, reviews and feedback, it has evolved as an attractive, challenging research area. It is one of the most common problems in social media to find toxic textual content.  Anonymity and concealment of identity are common on the Internet for people coming from a wide range of diversity of cultures and beliefs. Having freedom of speech, anonymity, and inadequate social media regulations make cyber toxic environment and cyberbullying significant issues, which require a system of automatic detection and prevention. As far as this is concerned, diverse research is taking place based on different approaches and languages, but a comprehensive analysis to examine them from all angles is lacking. This systematic literature review is therefore conducted with the aim of surveying the research and studies done to date on classification of  cyberbullying based in textual modality by the research community. It states the definition, , taxonomy, properties, outcome of cyberbullying, roles in cyberbullying  along with other forms of bullying and different offensive behavior in social media. This article also shows the latest popular benchmark datasets on cyberbullying, along with their number of classes (Binary/Multiple), reviewing the state-of-the-art methods to detect cyberbullying and abusive content on social media and discuss the factors that drive offenders to indulge in offensive activity, preventive actions to avoid online toxicity, and various cyber laws in different countries. Finally, we identify and discuss the challenges, solutions, additionally future research directions that serve as a reference to overcome cyberbullying in social media

    Man vs machine – Detecting deception in online reviews

    Get PDF
    This study focused on three main research objectives: analyzing the methods used to identify deceptive online consumer reviews, evaluating insights provided by multi-method automated approaches based on individual and aggregated review data, and formulating a review interpretation framework for identifying deception. The theoretical framework is based on two critical deception-related models, information manipulation theory and self-presentation theory. The findings confirm the interchangeable characteristics of the various automated text analysis methods in drawing insights about review characteristics and underline their significant complementary aspects. An integrative multi-method model that approaches the data at the individual and aggregate level provides more complex insights regarding the quantity and quality of review information, sentiment, cues about its relevance and contextual information, perceptual aspects, and cognitive material

    Sentiment Analysis for Social Media

    Get PDF
    Sentiment analysis is a branch of natural language processing concerned with the study of the intensity of the emotions expressed in a piece of text. The automated analysis of the multitude of messages delivered through social media is one of the hottest research fields, both in academy and in industry, due to its extremely high potential applicability in many different domains. This Special Issue describes both technological contributions to the field, mostly based on deep learning techniques, and specific applications in areas like health insurance, gender classification, recommender systems, and cyber aggression detection

    Reinventing the Social Scientist and Humanist in the Era of Big Data

    Get PDF
    This book explores the big data evolution by interrogating the notion that big data is a disruptive innovation that appears to be challenging existing epistemologies in the humanities and social sciences. Exploring various (controversial) facets of big data such as ethics, data power, and data justice, the book attempts to clarify the trajectory of the epistemology of (big) data-driven science in the humanities and social sciences

    A Socio-mathematical and Structure-Based Approach to Model Sentiment Dynamics in Event-Based Text

    Get PDF
    Natural language texts are often meant to express or impact the emotions of individuals. Recognizing the underlying emotions expressed in or triggered by textual content is essential if one is to arrive at an understanding of the full meaning that textual content conveys. Sentiment analysis (SA) researchers are becoming increasingly interested in investigating natural language processing techniques as well as emotion theory in order to detect, extract, and classify the sentiments that natural language text expresses. Most SA research is focused on the analysis of subjective documents from the writer’s perspective and their classification into categorical labels or sentiment polarity, in which text is associated with a descriptive label or a point on a continuum between two polarities. Researchers often perform sentiment or polarity classification tasks using machine learning (ML) techniques, sentiment lexicons, or hybrid-based approaches. Most ML methods rely on count-based word representations that fail to take word order into account. Despite the successful use of these flat word representations in topic-modelling problems, SA problems require a deeper understanding of sentence structure, since the entire meaning of words can be reversed through negations or word modifiers. On the other hand, approaches based on semantic lexicons are limited by the relatively small number of words they contain, which do not begin to embody the extensive and growing vocabulary on the Internet. The research presented in this thesis represents an effort to tackle the problem of sentiment analysis from a different viewpoint than those underlying current mainstream studies in this research area. A cross-disciplinary approach is proposed that incorporates affect control theory (ACT) into a structured model for determining the sentiment polarity of event-based articles from the perspectives of readers and interactants. A socio-mathematical theory, ACT provides valuable resources for handling interactions between words (event entities) and for predicting situational sentiments triggered by social events. ACT models human emotions arising from social event terms through the use of multidimensional representations that have been verified both empirically and theoretically. To model human emotions regarding textual content, the first step was to develop a fine-grained event extraction algorithm that extracts events and their entities from event-based textual information using semantic and syntactic parsing techniques. The results of the event extraction method were compared against a supervised learning approach on two human-coded corpora (a grammatically correct and a grammatically incorrect structured corpus). For both corpora, the semantic-syntactic event extraction method yielded a higher degree of accuracy than the supervised learning approach. The three-dimensional ACT lexicon was also augmented in a semi-supervised fashion using graph-based label propagation built from semantic and neural network word embeddings. The word embeddings were obtained through the training of commonly used count-based and neural-network-based algorithms on a single corpus, and each method was evaluated with respect to the reconstruction of a sentiment lexicon. The results show that, relative to other word embeddings and state-of-the-art methods, combining both semantic and neural word embeddings yielded the highest correlation scores and lowest error rates. Using the augmented lexicon and ACT mathematical equations, human emotions were modelled according to different levels of granularity (i.e., at the sentence and document levels). The initial stage involved the development of a proposed entity-based SA approach that models reader emotions triggered by event-based sentences. The emotions are modelled in a three-dimensional space based on reader sentiment toward different entities (e.g., subject and object) in the sentence. The new approach was evaluated using a human-annotated news-headline corpus; the results revealed the proposed method to be competitive with benchmark ML techniques. The second phase entailed the creation of a proposed ACT-based model for predicting the temporal progression of the emotions of the interactants and their optimal behaviour over a sequence of interactions. The model was evaluated using three different corpora: fairy tales, news articles, and a handcrafted corpus. The results produced by the proposed model demonstrate that, despite the challenging sentence structure, a reasonable agreement was achieved between the estimated emotions and behaviours and the corresponding ground truth

    User behavior modeling: Towards solving the duality of interpretability and precision

    Get PDF
    User behavior modeling has become an indispensable tool with the proliferation of socio-technical systems to provide a highly personalized experience to the users. These socio-technical systems are used in sectors as diverse as education, health, law to e-commerce, and social media. The two main challenges for user behavioral modeling are building an in-depth understanding of online user behavior and using advanced computational techniques to capture behavioral uncertainties accurately. This thesis addresses both these challenges by developing interpretable models that aid in understanding user behavior at scale and by developing sophisticated models that perform accurate modeling of user behavior. Specifically, we first propose two distinct interpretable approaches to understand explicit and latent user behavioral characteristics. Firstly, in Chapter 3, we propose an interpretable Gaussian Hidden Markov Model-based cluster model leveraging user activity data to identify users with similar patterns of behavioral evolution. We apply our approach to identify researchers with similar patterns of research interests evolution. We further show the utility of our interpretable framework to identify differences in gender distribution and the value of awarded grants among the identified archetypes. We also demonstrate generality of our approach by applying on StackExchange to identify users with a similar change in usage patterns. Next in Chapter 4, we estimate user latent behavioral characteristics by leveraging user-generated content (questions or answers) in Community Question Answering (CQA) platforms. In particular, we estimate the latent aspect-based reliability representations of users in the forum to infer the trustworthiness of their answers. We also simultaneously learn the semantic meaning of their answers through text representations. We empirically show that the estimated behavioral representations can accurately identify topical experts. We further propose to improve current behavioral models by modeling explicit and implicit user-to-user influence on user behavior. To this end, in Chapter 5, we propose a novel attention-based approach to incorporate influence from both user's social connections and other similar users on their preferences in recommender systems. Additionally, we also incorporate implicit influence in the item space by considering frequently co-occurring and similar feature items. Our modular approach captures the different influences efficiently and later fuses them in an interpretable manner. Extensive experiments show that incorporating user-to-user influence outperforms approaches relying on solely user data. User behavior remains broadly consistent across the platform. Thus, incorporating user behavioral information can be beneficial to estimate the characteristics of user-generated content. To verify it, in Chapter 6, we focus on the task of best answer selection in CQA forums that traditionally only considers textual features. We induce multiple connections between user-generated content, i.e., answers, based on the similarity and contrast in the behavior of authoring users in the platform. These induced connections enable information sharing between connected answers and, consequently, aid in estimating the quality of the answer. We also develop convolution operators to encode these semantically different graphs and later merge them using boosting. We also proposed an alternative approach to incorporate user behavioral information by jointly estimating the latent behavioral representations of user with text representations in Chapter 7. We evaluate our approach on the offensive language prediction task on Twitter. Specially, we learn an improved text representation by leveraging syntactic dependencies between the words in the tweet. We also estimate the abusive behavior of users, i.e., their likelihood of posting offensive content online from their tweets. We further show that combining the textual and user behavioral features can outperform the sophisticated textual baselines

    Tune your brown clustering, please

    Get PDF
    Brown clustering, an unsupervised hierarchical clustering technique based on ngram mutual information, has proven useful in many NLP applications. However, most uses of Brown clustering employ the same default configuration; the appropriateness of this configuration has gone predominantly unexplored. Accordingly, we present information for practitioners on the behaviour of Brown clustering in order to assist hyper-parametre tuning, in the form of a theoretical model of Brown clustering utility. This model is then evaluated empirically in two sequence labelling tasks over two text types. We explore the dynamic between the input corpus size, chosen number of classes, and quality of the resulting clusters, which has an impact for any approach using Brown clustering. In every scenario that we examine, our results reveal that the values most commonly used for the clustering are sub-optimal
    • …
    corecore