4,109 research outputs found

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    Function-as-a-Service Performance Evaluation: A Multivocal Literature Review

    Get PDF
    Function-as-a-Service (FaaS) is one form of the serverless cloud computing paradigm and is defined through FaaS platforms (e.g., AWS Lambda) executing event-triggered code snippets (i.e., functions). Many studies that empirically evaluate the performance of such FaaS platforms have started to appear but we are currently lacking a comprehensive understanding of the overall domain. To address this gap, we conducted a multivocal literature review (MLR) covering 112 studies from academic (51) and grey (61) literature. We find that existing work mainly studies the AWS Lambda platform and focuses on micro-benchmarks using simple functions to measure CPU speed and FaaS platform overhead (i.e., container cold starts). Further, we discover a mismatch between academic and industrial sources on tested platform configurations, find that function triggers remain insufficiently studied, and identify HTTP API gateways and cloud storages as the most used external service integrations. Following existing guidelines on experimentation in cloud systems, we discover many flaws threatening the reproducibility of experiments presented in the surveyed studies. We conclude with a discussion of gaps in literature and highlight methodological suggestions that may serve to improve future FaaS performance evaluation studies.Comment: improvements including postprint update

    Emotion recognition and analysis of netizens based on micro-blog during covid-19 epidemic

    Get PDF
    The research is about emotion recognition and analysis based on Micro-blog short text. Emotion recognition is an important field of text classification in Natural Language Processing. The data of this research comes from Micro-blog 100K record related to COVID-19 theme collected by Data fountain platform, the data are manually labeled, and the emotional tendencies of the text are negative, positive and neutral. The empirical part adopts dictionary emotion recognition method and machine learning emotion recognition respectively. The algorithms used include support vector machine and naive Bayes based on TFIDF, support vector machine and LSTM based on wod2vec. The five results are compared. Combined with statistical analysis methods, the emotions of netizens in the early stage of the epidemic are analyzed for public opinion. This research uses machine learning algorithm combined with statistical analysis to analyze current events in real time. It will be of great significance for the introduction and implementation of national policies

    Novel platform for topic group mining, crowd opinion analysis and opinion leader identification in on-line social network platforms

    Get PDF
    In recent years, topic group mining and massive crowd opinion analysis from on-line social network platforms have become some of the most important tasks not only in research area but also in industry. Systems of this sort can identify similar topics from a very large dataset, group them together based on the topic, and analyse the inclination of the content's owner. To solve this problem, which involves research from a number of different areas, an integrated platform needs to be proposed. Most community mining techniques treat the network as a graph where nodes represent users and edges reflect user relationship between two users. One obvious drawback of these approaches is that it can only utilise the explicit user relationships provided by on-line social network platforms. All other possible relationships will be ignored. Some on-line social network platforms restrict the length of content a user can publish. This causes traditional document clustering methods to perform poorly. Meanwhile, the restriction of content length also affects opinion mining performance since most content lacks contextual features. Hence, other context features that are not immediately or obviously related need to be investigated to improve performance in user inclination classification. This research proposes a novel three layered platform. Two core technologies of the platform are topic group mining and user inclination analysis. The integrated approach was evaluated by a series of experiments to examine each core technology. The results indicate that the proposed integrated platform is able to produce the following results. 1) Scores up to 0.82 by V-measure evaluation function in topic group mining. 2) High accuracy rate in inclination mining. 3) A flexible and adaptable platform design which can accommodate different on-line social networks easily

    Cashtag piggybacking: uncovering spam and bot activity in stock microblogs on Twitter

    Full text link
    Microblogs are increasingly exploited for predicting prices and traded volumes of stocks in financial markets. However, it has been demonstrated that much of the content shared in microblogging platforms is created and publicized by bots and spammers. Yet, the presence (or lack thereof) and the impact of fake stock microblogs has never systematically been investigated before. Here, we study 9M tweets related to stocks of the 5 main financial markets in the US. By comparing tweets with financial data from Google Finance, we highlight important characteristics of Twitter stock microblogs. More importantly, we uncover a malicious practice - referred to as cashtag piggybacking - perpetrated by coordinated groups of bots and likely aimed at promoting low-value stocks by exploiting the popularity of high-value ones. Among the findings of our study is that as much as 71% of the authors of suspicious financial tweets are classified as bots by a state-of-the-art spambot detection algorithm. Furthermore, 37% of them were suspended by Twitter a few months after our investigation. Our results call for the adoption of spam and bot detection techniques in all studies and applications that exploit user-generated content for predicting the stock market
    • …
    corecore