Search CORE

606 research outputs found

A blog mining framework

Author: Cao J
Chau M
Lam P
Shiu B
Xu J
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Blogs have become increasingly popular, and new blogs are generated every day. Many of the contents are useful for applications in various domains, such as business, politics, research, social work, and linguistics. However, automatically collecting and analyzing blogs isn't straightforward due to the large size and dynamic nature of the blogosphere. In this article, the authors propose a framework for blog mining that includes spiders, parsers, analyzers, and visualizers. They present several examples of blog mining applications based on their framework. © 2006 IEEE.published_or_final_versio

HKU Scholars Hub

Cashtag piggybacking: uncovering spam and bot activity in stock microblogs on Twitter

Author: Cresci Stefano
Lillo Fabrizio
Regoli Daniele
Tardelli Serena
Tesconi Maurizio
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/07/2018
Field of study

Microblogs are increasingly exploited for predicting prices and traded volumes of stocks in financial markets. However, it has been demonstrated that much of the content shared in microblogging platforms is created and publicized by bots and spammers. Yet, the presence (or lack thereof) and the impact of fake stock microblogs has never systematically been investigated before. Here, we study 9M tweets related to stocks of the 5 main financial markets in the US. By comparing tweets with financial data from Google Finance, we highlight important characteristics of Twitter stock microblogs. More importantly, we uncover a malicious practice - referred to as cashtag piggybacking - perpetrated by coordinated groups of bots and likely aimed at promoting low-value stocks by exploiting the popularity of high-value ones. Among the findings of our study is that as much as 71% of the authors of suspicious financial tweets are classified as bots by a state-of-the-art spambot detection algorithm. Furthermore, 37% of them were suspended by Twitter a few months after our investigation. Our results call for the adoption of spam and bot detection techniques in all studies and applications that exploit user-generated content for predicting the stock market

arXiv.org e-Print Archive

Archivio istituzionale della Ricerca - Scuola Normale Superiore

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Analysing features of Japanese splogs and characteristics of keywords

Author: Hiroshi Nakagawa
Noriko Kando
Takehito Utsuro
Tomohiro Fukuhara
Yasuhide Kawada
Yoshiaki Murakami
Yuuki Sato
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

This paper focuses on analyzing (Japanese) splogs based on various characteristics of keywords contained in them. We estimate the behavior of spammers when creating splogs from other sources by analyzing the characteristics of key-words contained in splogs. Since splogs often cause noises in word occurrence statistics in the blogosphere, we assume that we can efficiently (manually) collect splogs by sampling blog homepages containing keywords of a certain type on the date with its most frequent occurrence. We manually exam-ine various features of collected blog homepages regarding whether their text content is excerpt from other sources or not, as well as whether they display affiliate advertisement or out-going links to affiliated sites. Among various infor-mative results, it is important to note that more than half of the collected splogs are created by a very small number of spammers

CiteSeerX

Crossref

A study on Analysis and Utilization of Crowd-sourced Spatio-temporal Contexts from Social Media

Author: 若宮翔子
Publication venue
Publication date: 07/02/2014
Field of study

兵庫県立大学大学院201

University of Hyogo Academic Repository / 兵庫県立大学学術情報リポジトリ

Identifying long-term periodic cycles and memories of collective emotion in online social media

Author: Havlin Shlomo
Sano Yukie
Takayasu Hideki
Takayasu Misako
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

Collective emotion has been traditionally evaluated by questionnaire survey on a limited number of people. Recently, big data of written texts on the Internet has been available for analyzing collective emotion for very large scales. Although short-term reflection between collective emotion and real social phenomena has been widely studied, long-term dynamics of collective emotion has not been studied so far due to the lack of long persistent data sets. In this study, we extracted collective emotion over a 10-year period from 3.6 billion Japanese blog articles. Firstly, we find that collective emotion shows clear periodic cycles, i.e., weekly and seasonal behaviors, accompanied with pulses caused by natural disasters. For example, April is represented by high Tension, probably due to starting school in Japan. We also identified long-term memory in the collective emotion that is characterized by the power-law decay of the autocorrelation function over several months.Comment: 19 pages, 5 figures, 2 tables, accepted PLOS ON

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

BlogForever D2.6: Data Extraction Methodology

Author: Banos V.
Davis R.
Gkotsis G.
Pincent E.
Stepanyan K.
Publication venue
Publication date: 25/10/2013
Field of study

This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

What’s Happening Around the World? A Survey and Framework on Event Detection Techniques on Twitter

Author: Abbasi RA
Aljohani NR
Daud A
Maqbool O
Razzak I
Sadaf A
Saeed Z
Xu G
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2019
Field of study

© 2019, Springer Nature B.V. In the last few years, Twitter has become a popular platform for sharing opinions, experiences, news, and views in real-time. Twitter presents an interesting opportunity for detecting events happening around the world. The content (tweets) published on Twitter are short and pose diverse challenges for detecting and interpreting event-related information. This article provides insights into ongoing research and helps in understanding recent research trends and techniques used for event detection using Twitter data. We classify techniques and methodologies according to event types, orientation of content, event detection tasks, their evaluation, and common practices. We highlight the limitations of existing techniques and accordingly propose solutions to address the shortcomings. We propose a framework called EDoT based on the research trends, common practices, and techniques used for detecting events on Twitter. EDoT can serve as a guideline for developing event detection methods, especially for researchers who are new in this area. We also describe and compare data collection techniques, the effectiveness and shortcomings of various Twitter and non-Twitter-based features, and discuss various evaluation measures and benchmarking methodologies. Finally, we discuss the trends, limitations, and future directions for detecting events on Twitter

Deakin Research Online

OPUS - University of Technology Sydney