2,321 research outputs found

    A survey of data mining techniques for social media analysis

    Get PDF
    Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors

    Dictionary-Assisted Supervised Contrastive Learning

    Full text link
    Text analysis in the social sciences often involves using specialized dictionaries to reason with abstract concepts, such as perceptions about the economy or abuse on social media. These dictionaries allow researchers to impart domain knowledge and note subtle usages of words relating to a concept(s) of interest. We introduce the dictionary-assisted supervised contrastive learning (DASCL) objective, allowing researchers to leverage specialized dictionaries when fine-tuning pretrained language models. The text is first keyword simplified: a common, fixed token replaces any word in the corpus that appears in the dictionary(ies) relevant to the concept of interest. During fine-tuning, a supervised contrastive objective draws closer the embeddings of the original and keyword-simplified texts of the same class while pushing further apart the embeddings of different classes. The keyword-simplified texts of the same class are more textually similar than their original text counterparts, which additionally draws the embeddings of the same class closer together. Combining DASCL and cross-entropy improves classification performance metrics in few-shot learning settings and social science applications compared to using cross-entropy alone and alternative contrastive and data augmentation methods.Comment: 6 pages, 5 figures, EMNLP 202

    Literariness Revisited: Deviation vs. Entrenched Ideas

    Get PDF
    ‘Literariness’ basically means foregrounding. In this study it means: presenting a view that deviates from entrenched opinions. A poem by Emily Dickinson was manipulated: apart from the original version we constructed two versions which changed to entrenched ideas. Readers rated their reactions on 6 aesthetic dimensions, each comprising 5 Likert scales. Finally, they compared the three versions

    A large-scale sentiment analysis using political tweets

    Get PDF
    Twitter has become a key element of political discourse in candidates’ campaigns. The political polarization on Twitter is vital to politicians as it is a popular public medium to analyze and predict public opinion concerning political events. The analysis of the sentiment of political tweet contents mainly depends on the quality of sentiment lexicons. Therefore, it is crucial to create sentiment lexicons of the highest quality. In the proposed system, the domain-specific of the political lexicon is constructed by using the supervised approach to extract extreme political opinions words, and features in tweets. Political multi-class sentiment analysis (PMSA) system on the big data platform is developed to predict the inclination of tweets to infer the results of the elections by conducting the analysis on different political datasets: including the Trump election dataset and the BBC News politics. The comparative analysis is the experimental results which are better political text classification by using the three different models (multinomial naïve Bayes (MNB), decision tree (DT), linear support vector classification (SVC)). In the comparison of three different models, linear SVC has the better performance than the other two techniques. The analytical evaluation results show that the proposed system can be performed with 98% accuracy in linear SVC

    Adaptive sentiment analysis

    Get PDF
    Domain dependency is one of the most challenging problems in the field of sentiment analysis. Although most sentiment analysis methods have decent performance if they are targeted at a specific domain and writing style, they do not usually work well with texts that are originated outside of their domain boundaries. Often there is a need to perform sentiment analysis in a domain where no labelled document is available. To address this scenario, researchers have proposed many domain adaptation or unsupervised sentiment analysis methods. However, there is still much room for improvement, as those methods typically cannot match conventional supervised sentiment analysis methods. In this thesis, we propose a novel aspect-level sentiment analysis method that seamlessly integrates lexicon- and learning-based methods. While its performance is comparable to existing approaches, it is less sensitive to domain boundaries and can be applied to cross-domain sentiment analysis when the target domain is similar to the source domain. It also offers more structured and readable results by detecting individual topic aspects and determining their sentiment strengths. Furthermore, we investigate a novel approach to automatically constructing domain-specific sentiment lexicons based on distributed word representations (aka word embeddings). The induced lexicon has quality on a par with a handcrafted one and could be used directly in a lexiconbased algorithm for sentiment analysis, but we find that a two-stage bootstrapping strategy could further boost the sentiment classification performance. Compared to existing methods, such an end-to-end nearly-unsupervised approach to domain-specific sentiment analysis works out of the box for any target domain, requires no handcrafted lexicon or labelled corpus, and achieves sentiment classification accuracy comparable to that of fully supervised approaches. Overall, the contribution of this Ph.D. work to the research field of sentiment analysis is twofold. First, we develop a new sentiment analysis system which can — in a nearlyunsupervised manner—adapt to the domain at hand and perform sentiment analysis with minimal loss of performance. Second, we showcase this system in several areas (including finance, politics, and e-business), and investigate particularly the temporal dynamics of sentiment in such contexts

    A Survey of Data Mining Techniques for Social Network Analysis

    Get PDF
    Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their author

    ”We are all in this together” : What are the challenges Google ”helps” media industries with?

    Get PDF
    Studies have shown that the platform companies Google and Facebook have a disruptive nature in how media companies organise their work, and some researchers claim they are a duopoly in digital advertising. However, Google says it supports media by “helping” media industries through funding and training. This study argues that by examining what media projects Google supports, we get a good overview of what challenges journalism is currently facing and the solutions for tackling these problems, and ultimately, how this connects to Google as a platform company and to its narrative. This study aims to investigate which media industry challenges Google tries to address by financial support and to examine the solutions to these challenges proposed in accepted Digital News Innovation Fund (DNI) projects. Thus, this research asks: What are the challenges for media and journalists that Google Digital News Initiative is addressing? What specific challenges get the largest support? What are the main solutions proposed in projects supported by Google DNI? Based on the review of the literature about the relationships between platform companies and media and responses to challenging conditions in the ecosystem of platforms, qualitative content analysis was used to examine the last round of the DNI Fund’s 102 projects. The analysis demonstrated that Google supports projects that classify in three directions: Business Model Innovations, Product Development in Editorial Processes and Ecosystem Development Approaches. One of the most interesting findings shows that Google favours supporting projects that concern solutions for the increase in audience subscriptions rather than addressing what publishers have concerns about the most – Google’s domination over the digital advertisement. The results open the discussion about the possible signs of Google’s support in media industries being a “self-help” for their mission of organising the world’s information. Further research is needed to identify what is the content of the other projects Google presents as “help” to media industries

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications
    corecore