22 research outputs found

    SENTIMENT ANALYSIS OF CHINESE MICROBLOG MESSAGE USING NEURAL NETWORK-BASED VECTOR REPRESENTATION FOR MEASURING REGIONAL PREJUDICE

    Get PDF
    Regional prejudice is prevalent in Chinese cities in which native residents and migrants lack a basic level of trust in the other group. Like Twitter, Sina Weibo is a social media platform where people actively engage in discussions on various social issues. Thus, it provides a good data source for measuring individuals’ regional prejudice on a large scale. We find that a resentful tone dominates in Weibo messages related to migrants. In this paper, we propose a novel approach, named DKV, for recognizing polarity and direction of sentiment for Weibo messages using distributed real-valued vector representation of keywords learned from neural networks. Such a representation can project rich context information (or embedding) into the vector space, and subsequently be used to infer similarity measures among words, sentences, and even documents. We provide a comprehensive performance evaluation to demonstrate that by exploiting the keyword embeddings, DKV paired with support vector machines can effectively recognize a Weibo message into the predefined sentiment and its direction. Results demonstrate that our method can achieve the best performances compared to other approaches

    Exploring Text Mining and Analytics for Applications in Public Security: An in-depth dive into a systematic literature review

    Get PDF
    Text mining and related analytics emerge as a technological approach to support human activities in extracting useful knowledge through texts in several formats. From a managerial point of view, it can help organizations in planning and decision-making processes, providing information that was not previously evident through textual materials produced internally or even externally. In this context, within the public/governmental scope, public security agencies are great beneficiaries of the tools associated with text mining, in several aspects, from applications in the criminal area to the collection of people's opinions and sentiments about the actions taken to promote their welfare. This article reports details of a systematic literature review focused on identifying the main areas of text mining application in public security, the most recurrent technological tools, and future research directions. The searches covered four major article bases (Scopus, Web of Science, IEEE Xplore, and ACM Digital Library), selecting 194 materials published between 2014 and the first half of 2021, among journals, conferences, and book chapters. There were several findings concerning the targets of the literature review, as presented in the results of this article

    AI approaches to understand human deceptions, perceptions, and perspectives in social media

    Get PDF
    Social media platforms have created virtual space for sharing user generated information, connecting, and interacting among users. However, there are research and societal challenges: 1) The users are generating and sharing the disinformation 2) It is difficult to understand citizens\u27 perceptions or opinions expressed on wide variety of topics; and 3) There are overloaded information and echo chamber problems without overall understanding of the different perspectives taken by different people or groups. This dissertation addresses these three research challenges with advanced AI and Machine Learning approaches. To address the fake news, as deceptions on the facts, this dissertation presents Machine Learning approaches for fake news detection models, and a hybrid method for topic identification, whether they are fake or real. To understand the user\u27s perceptions or attitude toward some topics, this study analyzes the sentiments expressed in social media text. The sentiment analysis of posts can be used as an indicator to measure how topics are perceived by the users and how their perceptions as a whole can affect decision makers in government and industry, especially during the COVID-19 pandemic. It is difficult to measure the public perception of government policies issued during the pandemic. The citizen responses to the government policies are diverse, ranging from security or goodwill to confusion, fear, or anger. This dissertation provides a near real-time approach to track and monitor public reactions toward government policies by continuously collecting and analyzing Twitter posts about the COVID-19 pandemic. To address the social media\u27s overwhelming number of posts, content echo-chamber, and information isolation issue, this dissertation provides a multiple view-based summarization framework where the same contents can be summarized according to different perspectives. This framework includes components of choosing the perspectives, and advanced text summarization approaches. The proposed approaches in this dissertation are demonstrated with a prototype system to continuously collect Twitter data about COVID-19 government health policies and provide analysis of citizen concerns toward the policies, and the data is analyzed for fake news detection and for generating multiple-view summaries

    Fuzzy-based machine learning for predicting narcissistic traits among Twitter users.

    Get PDF
    Doctoral Degree. University of KwaZulu-Natal, Pietermaritzburg.Social media has provided a platform for people to share views and opinions they identify with or which are significant to them. Similarly, social media enables individuals to express themselves authentically and divulge their personal experiences in a variety of ways. This behaviour, in turn, reflects the user’s personality. Social media has in recent times been used to perpetuate various forms of crimes, and a narcissistic personality trait has been linked to violent criminal activities. This negative side effect of social media calls for multiple ways to respond and prevent damage instigated. Eysenck's theory on personality and crime postulated that various forms of crime are caused by a mixture of environmental and neurological causes. This theory suggests certain people are more likely to commit a crime, and personality is the principal factor in criminal behaviour. Twitter is a widely used social media platform for sharing news, opinions, feelings, and emotions by users. Given that narcissists have an inflated self-view and engage in a variety of strategies aimed at bringing attention to themselves, features unique to Twitter are more appealing to narcissists than those on sites such as Facebook. This study adopted design science research methodology to develop a fuzzy-based machine learning predictive model to identify traces of narcissism from Twitter using data obtained from the activities of a user. Performance evaluation of various classifiers was conducted and an optimal classifier with 95% accuracy was obtained. The research found that the size of the dataset and input variables have an influence on classifier accuracy. In addition, the research developed an updated process model and recommended a research model for narcissism classification

    Probing the Limits of Social Data:Biases, Methods, and Domain Knowledge

    Get PDF
    Online social data has been hailed to provide unprecedented insights into human phenomena due to its ability to capture human behavior at a scale and level of detail, both in breadth and depth, that is hard to achieve through conventional data collection techniques. This has led to numerous studies that leverage online social data to model or gain insights about real world phenomena, as well as to inform system or methods design for performance gains, or for providing personalized services. Alas, regardless of how large, detailed or varied the online social data is, there are limits to what can be discerned from it about real-world, or even media- or application-specific phenomena. This thesis investigates four instances of such limits that are related to both the properties of the working data sets and of the methods used to acquire and leverage them, including: (1) online social media biases, (2) assessing and (3) reducing data collection biases, and (4) methods sensitivity to data biases and variability. For each of them, we conduct a separate case study that enables us to systematically devise and apply consistent methodologies to collect, process, compare or assess different data sets and dedicated methods. The main contributions of this thesis are: (i) To gain insights into media-specific biases, we run a comparative study juxtaposing social and mainstream media coverage of domain-specific news events for a period of 17 months. To this end, we introduce a generic methodology for comparing news agendas online based on a comparison of spikes of coverage. We expose significant differences in the type of events that are covered by the two media. (ii) To assess possible biases across data collections, we run a transversal study that systematically assembles and examines 26 distinct data sets of social media posts during a variety of crisis events spanning a 2 years period. While we find patterns and consistencies, we also uncover substantial variability across different event data sets, highlighting the pitfalls of generalizing findings from one data set to another. (iii) To improve data collections, we introduce a method that increases the recall of social media samples, while preserving the original distribution of message types and sources. To locate and monitor domain-specific events, this method constructs and applies a domain-specific, yet generic lexicon, automatically learning event-specific terms and adapting the lexicon to the targeted event. The resulted improvements also show that only a fraction of the relevant data is currently mined. (iv) To test the methods sensitivity, to data biases and variability we run an empirical evaluation on 6 real-world data sets dissecting the impact of user and item attributes on the performance of recommendation approaches that leverage distinct social cues--explicit social links vs. implicit interest affinity. We show performance variations not only across data sets, but also within each data set, across different classes of users or items, suggesting that global metrics are often unsuited for assessing recommendation systems performance. The overarching goal of this thesis is to contribute a practical perspective to the body of research that aims to quantify biases, to devise better methods to collect and model social data, and to evaluate such methods in context

    Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Pattern Discrimination

    Get PDF
    Algorithmic identity politics reinstate old forms of social segregation - in a digital world, identity politics is pattern discrimination. It is by recognizing patterns in input data that Artificial Intelligence algorithms create bias and practice racial exclusions thereby inscribing power relations into media. How can we filter information out of data without reinserting racist, sexist, and classist beliefs

    Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021

    Get PDF
    The eighth edition of the Italian Conference on Computational Linguistics (CLiC-it 2021) was held at UniversitĂ  degli Studi di Milano-Bicocca from 26th to 28th January 2022. After the edition of 2020, which was held in fully virtual mode due to the health emergency related to Covid-19, CLiC-it 2021 represented the first moment for the Italian research community of Computational Linguistics to meet in person after more than one year of full/partial lockdown

    Pattern Discrimination

    Get PDF
    Algorithmic identity politics reinstate old forms of social segregation—in a digital world, identity politics is pattern discrimination. It is by recognizing patterns in input data that Artificial Intelligence algorithms create bias and practice racial exclusions thereby inscribing power relations into media. How can we filter information out of data without reinserting racist, sexist, and classist beliefs

    Tune your brown clustering, please

    Get PDF
    Brown clustering, an unsupervised hierarchical clustering technique based on ngram mutual information, has proven useful in many NLP applications. However, most uses of Brown clustering employ the same default configuration; the appropriateness of this configuration has gone predominantly unexplored. Accordingly, we present information for practitioners on the behaviour of Brown clustering in order to assist hyper-parametre tuning, in the form of a theoretical model of Brown clustering utility. This model is then evaluated empirically in two sequence labelling tasks over two text types. We explore the dynamic between the input corpus size, chosen number of classes, and quality of the resulting clusters, which has an impact for any approach using Brown clustering. In every scenario that we examine, our results reveal that the values most commonly used for the clustering are sub-optimal
    corecore