Search CORE

2,321 research outputs found

A survey of data mining techniques for social media analysis

Author: Adedoyin-Olowe Mariam
Gaber Mohamed Medhat
Stahl Frederic
Publication venue: Episciences
Publication date: 16/04/2014
Field of study

Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors

arXiv.org e-Print Archive

Central Archive at the University of Reading

Crossref

Dictionary-Assisted Supervised Contrastive Learning

Author: Bonneau Richard
Nagler Jonathan
Tucker Joshua A.
Wu Patrick Y.
Publication venue
Publication date: 27/10/2022
Field of study

Text analysis in the social sciences often involves using specialized dictionaries to reason with abstract concepts, such as perceptions about the economy or abuse on social media. These dictionaries allow researchers to impart domain knowledge and note subtle usages of words relating to a concept(s) of interest. We introduce the dictionary-assisted supervised contrastive learning (DASCL) objective, allowing researchers to leverage specialized dictionaries when fine-tuning pretrained language models. The text is first keyword simplified: a common, fixed token replaces any word in the corpus that appears in the dictionary(ies) relevant to the concept of interest. During fine-tuning, a supervised contrastive objective draws closer the embeddings of the original and keyword-simplified texts of the same class while pushing further apart the embeddings of different classes. The keyword-simplified texts of the same class are more textually similar than their original text counterparts, which additionally draws the embeddings of the same class closer together. Combining DASCL and cross-entropy improves classification performance metrics in few-shot learning settings and social science applications compared to using cross-entropy alone and alternative contrastive and data augmentation methods.Comment: 6 pages, 5 figures, EMNLP 202

arXiv.org e-Print Archive

Literariness Revisited: Deviation vs. Entrenched Ideas

Author: Chesnokova Anna
van Peer Willie
Publication venue
Publication date: 01/01/2016
Field of study

‘Literariness’ basically means foregrounding. In this study it means: presenting a view that deviates from entrenched opinions. A poem by Emily Dickinson was manipulated: apart from the original version we constructed two versions which changed to entrenched ideas. Readers rated their reactions on 6 aesthetic dimensions, each comprising 5 Likert scales. Finally, they compared the three versions

Borys Grinchenko Kyiv University Institutional repository

A large-scale sentiment analysis using political tweets

Author: Khaing Myo
Tun Yin Min
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/12/2023
Field of study

Twitter has become a key element of political discourse in candidates’ campaigns. The political polarization on Twitter is vital to politicians as it is a popular public medium to analyze and predict public opinion concerning political events. The analysis of the sentiment of political tweet contents mainly depends on the quality of sentiment lexicons. Therefore, it is crucial to create sentiment lexicons of the highest quality. In the proposed system, the domain-specific of the political lexicon is constructed by using the supervised approach to extract extreme political opinions words, and features in tweets. Political multi-class sentiment analysis (PMSA) system on the big data platform is developed to predict the inclination of tweets to infer the results of the elections by conducting the analysis on different political datasets: including the Trump election dataset and the BBC News politics. The comparative analysis is the experimental results which are better political text classification by using the three different models (multinomial naïve Bayes (MNB), decision tree (DT), linear support vector classification (SVC)). In the comparison of three different models, linear SVC has the better performance than the other two techniques. The analytical evaluation results show that the proposed system can be performed with 98% accuracy in linear SVC

Institute of Advanced Engineering and Science

Adaptive sentiment analysis

Author: Mudinas Andrius
Publication venue
Publication date
Field of study

Domain dependency is one of the most challenging problems in the field of sentiment analysis. Although most sentiment analysis methods have decent performance if they are targeted at a specific domain and writing style, they do not usually work well with texts that are originated outside of their domain boundaries. Often there is a need to perform sentiment analysis in a domain where no labelled document is available. To address this scenario, researchers have proposed many domain adaptation or unsupervised sentiment analysis methods. However, there is still much room for improvement, as those methods typically cannot match conventional supervised sentiment analysis methods. In this thesis, we propose a novel aspect-level sentiment analysis method that seamlessly integrates lexicon- and learning-based methods. While its performance is comparable to existing approaches, it is less sensitive to domain boundaries and can be applied to cross-domain sentiment analysis when the target domain is similar to the source domain. It also offers more structured and readable results by detecting individual topic aspects and determining their sentiment strengths. Furthermore, we investigate a novel approach to automatically constructing domain-specific sentiment lexicons based on distributed word representations (aka word embeddings). The induced lexicon has quality on a par with a handcrafted one and could be used directly in a lexiconbased algorithm for sentiment analysis, but we find that a two-stage bootstrapping strategy could further boost the sentiment classification performance. Compared to existing methods, such an end-to-end nearly-unsupervised approach to domain-specific sentiment analysis works out of the box for any target domain, requires no handcrafted lexicon or labelled corpus, and achieves sentiment classification accuracy comparable to that of fully supervised approaches. Overall, the contribution of this Ph.D. work to the research field of sentiment analysis is twofold. First, we develop a new sentiment analysis system which can — in a nearlyunsupervised manner—adapt to the domain at hand and perform sentiment analysis with minimal loss of performance. Second, we showcase this system in several areas (including finance, politics, and e-business), and investigate particularly the temporal dynamics of sentiment in such contexts

Birkbeck Institutional Research Online

A Survey of Data Mining Techniques for Social Network Analysis

Author: Adedoyin-Olowe Mariam
Gaber Mohamed Medhat
Stahl Frederic
Publication venue
Publication date: 01/01/2014
Field of study

Birmingham City University Open Access Repository

BCU Open Access

”We are all in this together” : What are the challenges Google ”helps” media industries with?

Author: Nartise Ilze
Publication venue: Helsingfors universitet
Publication date: 01/01/2019
Field of study

Studies have shown that the platform companies Google and Facebook have a disruptive nature in how media companies organise their work, and some researchers claim they are a duopoly in digital advertising. However, Google says it supports media by “helping” media industries through funding and training. This study argues that by examining what media projects Google supports, we get a good overview of what challenges journalism is currently facing and the solutions for tackling these problems, and ultimately, how this connects to Google as a platform company and to its narrative. This study aims to investigate which media industry challenges Google tries to address by financial support and to examine the solutions to these challenges proposed in accepted Digital News Innovation Fund (DNI) projects. Thus, this research asks: What are the challenges for media and journalists that Google Digital News Initiative is addressing? What specific challenges get the largest support? What are the main solutions proposed in projects supported by Google DNI? Based on the review of the literature about the relationships between platform companies and media and responses to challenging conditions in the ecosystem of platforms, qualitative content analysis was used to examine the last round of the DNI Fund’s 102 projects. The analysis demonstrated that Google supports projects that classify in three directions: Business Model Innovations, Product Development in Editorial Processes and Ecosystem Development Approaches. One of the most interesting findings shows that Google favours supporting projects that concern solutions for the increase in audience subscriptions rather than addressing what publishers have concerns about the most – Google’s domination over the digital advertisement. The results open the discussion about the possible signs of Google’s support in media industries being a “self-help” for their mission of organising the world’s information. Further research is needed to identify what is the content of the other projects Google presents as “help” to media industries

Helsingin yliopiston digitaalinen arkisto

High-Performance Modelling and Simulation for Big Data Applications

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/02/2021
Field of study

This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

Directory of Open Access Books (DOAB)

A geometric approach for fast affordance determination in 3D

Author: Ruiz Libreros Eduardo D
Publication venue
Publication date: 25/06/2019
Field of study

Explore Bristol Research