4,109 research outputs found
Featuring, Detecting, and Visualizing Human Sentiment in Chinese Micro-Blog
2015-2016 > Academic research: refereed > Publication in refereed journa
BlogForever D2.6: Data Extraction Methodology
This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform
Function-as-a-Service Performance Evaluation: A Multivocal Literature Review
Function-as-a-Service (FaaS) is one form of the serverless cloud computing
paradigm and is defined through FaaS platforms (e.g., AWS Lambda) executing
event-triggered code snippets (i.e., functions). Many studies that empirically
evaluate the performance of such FaaS platforms have started to appear but we
are currently lacking a comprehensive understanding of the overall domain. To
address this gap, we conducted a multivocal literature review (MLR) covering
112 studies from academic (51) and grey (61) literature. We find that existing
work mainly studies the AWS Lambda platform and focuses on micro-benchmarks
using simple functions to measure CPU speed and FaaS platform overhead (i.e.,
container cold starts). Further, we discover a mismatch between academic and
industrial sources on tested platform configurations, find that function
triggers remain insufficiently studied, and identify HTTP API gateways and
cloud storages as the most used external service integrations. Following
existing guidelines on experimentation in cloud systems, we discover many flaws
threatening the reproducibility of experiments presented in the surveyed
studies. We conclude with a discussion of gaps in literature and highlight
methodological suggestions that may serve to improve future FaaS performance
evaluation studies.Comment: improvements including postprint update
Emotion recognition and analysis of netizens based on micro-blog during covid-19 epidemic
The research is about emotion recognition and analysis based on Micro-blog short text. Emotion recognition is an important field of text classification in Natural Language Processing. The data of this research comes from Micro-blog 100K record related to COVID-19 theme collected by Data fountain platform, the data are manually labeled, and the emotional tendencies of the text are negative, positive and neutral. The empirical part adopts dictionary emotion recognition method and machine learning emotion recognition respectively. The algorithms used include support vector machine and naive Bayes based on TFIDF, support vector machine and LSTM based on wod2vec. The five results are compared. Combined with statistical analysis methods, the emotions of netizens in the early stage of the epidemic are analyzed for public opinion. This research uses machine learning algorithm combined with statistical analysis to analyze current events in real time. It will be of great significance for the introduction and implementation of national policies
Novel platform for topic group mining, crowd opinion analysis and opinion leader identification in on-line social network platforms
In recent years, topic group mining and massive crowd opinion analysis from on-line social network platforms have become some of the most important tasks not only in research area but also in industry. Systems of this sort can identify similar topics from a very large dataset, group them together based on the topic, and analyse the inclination of the content's owner. To solve this problem, which involves research from a number of different areas, an integrated platform needs to be proposed.
Most community mining techniques treat the network as a graph where nodes represent users and edges reflect user relationship between two users. One obvious drawback of these approaches is that it can only utilise the explicit user relationships provided by on-line social network platforms. All other possible relationships will be ignored. Some on-line social network platforms restrict the length of content a user can publish. This causes traditional document clustering methods to perform poorly. Meanwhile, the restriction of content length also affects opinion mining performance since most content lacks contextual features. Hence, other context features that are not immediately or obviously related need to be investigated to improve performance in user inclination classification.
This research proposes a novel three layered platform. Two core technologies of the platform are topic group mining and user inclination analysis. The integrated approach was evaluated by a series of experiments to examine each core technology. The results indicate that the proposed integrated platform is able to produce the following results. 1) Scores up to 0.82 by V-measure evaluation function in topic group mining. 2) High accuracy rate in inclination mining. 3) A flexible and adaptable platform design which can accommodate different on-line social networks easily
Cashtag piggybacking: uncovering spam and bot activity in stock microblogs on Twitter
Microblogs are increasingly exploited for predicting prices and traded
volumes of stocks in financial markets. However, it has been demonstrated that
much of the content shared in microblogging platforms is created and publicized
by bots and spammers. Yet, the presence (or lack thereof) and the impact of
fake stock microblogs has never systematically been investigated before. Here,
we study 9M tweets related to stocks of the 5 main financial markets in the US.
By comparing tweets with financial data from Google Finance, we highlight
important characteristics of Twitter stock microblogs. More importantly, we
uncover a malicious practice - referred to as cashtag piggybacking -
perpetrated by coordinated groups of bots and likely aimed at promoting
low-value stocks by exploiting the popularity of high-value ones. Among the
findings of our study is that as much as 71% of the authors of suspicious
financial tweets are classified as bots by a state-of-the-art spambot detection
algorithm. Furthermore, 37% of them were suspended by Twitter a few months
after our investigation. Our results call for the adoption of spam and bot
detection techniques in all studies and applications that exploit
user-generated content for predicting the stock market
- âŚ