87 research outputs found

    Multilingual Twitter Sentiment Classification: The Role of Human Annotators

    Get PDF
    What are the limits of automated Twitter sentiment classification? We analyze a large set of manually labeled tweets in different languages, use them as training data, and construct automated classification models. It turns out that the quality of classification models depends much more on the quality and size of training data than on the type of the model trained. Experimental results indicate that there is no statistically significant difference between the performance of the top classification models. We quantify the quality of training data by applying various annotator agreement measures, and identify the weakest points of different datasets. We show that the model performance approaches the inter-annotator agreement when the size of the training set is sufficiently large. However, it is crucial to regularly monitor the self- and inter-annotator agreements since this improves the training datasets and consequently the model performance. Finally, we show that there is strong evidence that humans perceive the sentiment classes (negative, neutral, and positive) as ordered

    Predictive Analytics on Emotional Data Mined from Digital Social Networks with a Focus on Financial Markets

    Get PDF
    This dissertation is a cumulative dissertation and is comprised of five articles. User-Generated Content (UGC) comprises a substantial part of communication via social media. In this dissertation, UGC that carries and facilitates the exchange of emotions is referred to as “emotional data.” People “produce” emotional data, that is, they express their emotions via tweets, forum posts, blogs, and so on, or they “consume” it by being influenced by expressed sentiments, feelings, opinions, and the like. Decisions often depend on shared emotions and data – which again lead to new data because decisions may change behaviors or results. “Emotional Data Intelligence” ultimately seeks an answer to the question of how all the different emotions expressed in public online sources influence decision-making processes. The overarching research topic of this dissertation follows the question whether network structures and emotional sentiment data extracted from digital social networks contain predictive information or they are just noise. Underlying data was collected from different social media sources, such as Twitter, blogs, message boards, or online news and social networking sites, such as Xing. By means of methodologies of social network analysis (SNA), sentiment analysis, and predictive analysis the individual contributions of this dissertation study whether sentiment data from social media or online social networking structures can predict real-world behaviors. The focus lies on the analysis of emotional data and network structures and its predictive power for financial markets. With the formal construction of the data analyses methodologies introduced in the individual contributions this dissertation contributes to the theories of social network analysis, sentiment analysis, and predictive analytics

    Twitter and society

    Get PDF

    Trustworthiness in Social Big Data Incorporating Semantic Analysis, Machine Learning and Distributed Data Processing

    Get PDF
    This thesis presents several state-of-the-art approaches constructed for the purpose of (i) studying the trustworthiness of users in Online Social Network platforms, (ii) deriving concealed knowledge from their textual content, and (iii) classifying and predicting the domain knowledge of users and their content. The developed approaches are refined through proof-of-concept experiments, several benchmark comparisons, and appropriate and rigorous evaluation metrics to verify and validate their effectiveness and efficiency, and hence, those of the applied frameworks

    A framework for smart traffic management using heterogeneous data sources

    Get PDF
    A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.Traffic congestion constitutes a social, economic and environmental issue to modern cities as it can negatively impact travel times, fuel consumption and carbon emissions. Traffic forecasting and incident detection systems are fundamental areas of Intelligent Transportation Systems (ITS) that have been widely researched in the last decade. These systems provide real time information about traffic congestion and other unexpected incidents that can support traffic management agencies to activate strategies and notify users accordingly. However, existing techniques suffer from high false alarm rate and incorrect traffic measurements. In recent years, there has been an increasing interest in integrating different types of data sources to achieve higher precision in traffic forecasting and incident detection techniques. In fact, a considerable amount of literature has grown around the influence of integrating data from heterogeneous data sources into existing traffic management systems. This thesis presents a Smart Traffic Management framework for future cities. The proposed framework fusions different data sources and technologies to improve traffic prediction and incident detection systems. It is composed of two components: social media and simulator component. The social media component consists of a text classification algorithm to identify traffic related tweets. These traffic messages are then geolocated using Natural Language Processing (NLP) techniques. Finally, with the purpose of further analysing user emotions within the tweet, stress and relaxation strength detection is performed. The proposed text classification algorithm outperformed similar studies in the literature and demonstrated to be more accurate than other machine learning algorithms in the same dataset. Results from the stress and relaxation analysis detected a significant amount of stress in 40% of the tweets, while the other portion did not show any emotions associated with them. This information can potentially be used for policy making in transportation, to understand the users��� perception of the transportation network. The simulator component proposes an optimisation procedure for determining missing roundabouts and urban roads flow distribution using constrained optimisation. Existing imputation methodologies have been developed on straight section of highways and their applicability for more complex networks have not been validated. This task presented a solution for the unavailability of roadway sensors in specific parts of the network and was able to successfully predict the missing values with very low percentage error. The proposed imputation methodology can serve as an aid for existing traffic forecasting and incident detection methodologies, as well as for the development of more realistic simulation networks

    Social media as intelligence in disaster response: eyewitness classification using community detection

    Get PDF
    Disasters cause widespread devastation to both physical infrastructure and the lives of individuals residing in large geographic areas. The disruption caused by disaster events is further compounded by high levels of uncertainty and information scarcity, presenting significant challenges to disaster response organisations and impeding the effectiveness of coordinated response efforts. The increasing use of digital technologies, such as social media, presents valuable sources of information that are available in real-time from geographically-distributed networks of ‘humans as sensors’. The data generated by these technologies can supplement traditional sources of intelligence to build models of situational awareness and inform decision-making, resulting in more effective disaster response operations. This thesis proposes a method of curating social media data to enhance its usefulness as a source of intelligence for disaster response organisations during crisis events. The research was conducted in four phases: (i) An ethnographic study developed a conceptual framework of the values and challenges of social media intelligence as perceived by disaster response practitioners. High data volume and low rates of relevance were established as key factors impeding integration with existing intelligence sources. (ii) Empirical studies of Twitter discourse were conducted during eight disaster events to identify patterns of online behaviour and establish the informative potential of social media data as a rich source of eyewitness reports. (iii) Geoproximate preferential attachment (homophily) was identified in the structure of Twitter relationship networks. An eyewitness classification model integrated relationship features for data curation. The model was evaluated on temporally-partitioned subgraphs and shown to be effective in real-time environments. (iv) The classification model was validated in simulated disaster response scenarios conducted with emergency service practitioners. Feedback from participants confirmed the effectiveness of the approach to improving the practical value of social media data as a source of intelligence during disaster response operations

    The Coalescent State: Assemblages of Surveillance and Public Policy

    Get PDF
    Traditional public policy models are not fully capable of analysing the multiplicities of public policy, particularly when dealing with the rhizomatic qualities of surveillance and protest. Instead, public policy and its effects should be considered an emergent and intensive property of the assemblages that ebb and flow around policy issues. This thesis takes a programmatic approach to understanding discourse around protest as part of an attempt to operationalise assemblage based research at a large scale

    A comparison of statistical machine learning methods in heartbeat detection and classification

    Get PDF
    In health care, patients with heart problems require quick responsiveness in a clinical setting or in the operating theatre. Towards that end, automated classification of heartbeats is vital as some heartbeat irregularities are time consuming to detect. Therefore, analysis of electro-cardiogram (ECG) signals is an active area of research. The methods proposed in the literature depend on the structure of a heartbeat cycle. In this paper, we use interval and amplitude based features together with a few samples from the ECG signal as a feature vector. We studied a variety of classification algorithms focused especially on a type of arrhythmia known as the ventricular ectopic fibrillation (VEB). We compare the performance of the classifiers against algorithms proposed in the literature and make recommendations regarding features, sampling rate, and choice of the classifier to apply in a real-time clinical setting. The extensive study is based on the MIT-BIH arrhythmia database. Our main contribution is the evaluation of existing classifiers over a range sampling rates, recommendation of a detection methodology to employ in a practical setting, and extend the notion of a mixture of experts to a larger class of algorithms

    Informational Power on Twitter: A Mixed-methods Exploration of User Knowledge and Technological Discourse About Information Flows

    Get PDF
    Following a number of recent examples where social media users have been confronted by information flows that did not match their understandings of the platforms, there is a pressing need to examine public knowledge of information flows on these systems, to map how this knowledge lines up against the extant flows of these systems, and to explore the factors that contribute to the construction of knowledge about these systems. There is an immediacy to this issue because as social media sites become further entrenched as dominant vehicles for communication, knowledge about these technologies will play an ever increasing role in users’ abilities to gauge the risks for information disclosure, to understand and respond to global information flows, to make meaningful decisions about use and participation, and to be a part of conversations around how information flows in these spaces should be governed. Ultimately, knowledge about how information flows through these platforms helps shape users’ informational power. This dissertation responds to such a need by investigating the extant state of information flows on the popular social media platform “Twitter,” user knowledge about information flows on Twitter, and explores how Twitter, Inc.’s messaging to users may impact users’ knowledge construction. Through a mixed-method approach that includes a science and technology studies informed technical analysis of the Twitter platform, a quantitative analysis of survey data gathered from Twitter users and non-users which tested knowledge of different aspects of information flows on Twitter, and a critical discourse analysis of Twitter’s messaging to users in the new-user orientation process, this dissertation theorizes how junctures and disjunctures among the three can impact individual power. Findings of this project suggest that while many of the protocols and algorithmic functions associated with real-time information production and consumption on Twitter are well understood by users and are clearly articulated by Twitter, Inc., other aspects of information flows on the platform—such as the commodification of user-generated content, the long-term lifecycle of Tweets (such as the archival of Twitter by the Library of Congress), and the differential global flows of information—are not as well understood by users, nor explained in as much detail by Twitter, Inc. This dissertation describes the resulting state of users’ informational power as one of “information flow solipsism.
    corecore