Search CORE

27,644 research outputs found

The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems

Author: Gao Wanling
Jia Zhen
Shi Yingjie
Wang Lei
Zhan Jianfeng
Zhang Lixin
Zhou Runlin
Zhu Chunge
Publication venue
Publication date: 30/07/2013
Field of study

Now we live in an era of big data, and big data applications are becoming more and more pervasive. How to benchmark data center computer systems running big data applications (in short big data systems) is a hot topic. In this paper, we focus on measuring the performance impacts of diverse applications and scalable volumes of data sets on big data systems. For four typical data analysis applications---an important class of big data applications, we find two major results through experiments: first, the data scale has a significant impact on the performance of big data systems, so we must provide scalable volumes of data sets in big data benchmarks. Second, for the four applications, even all of them use the simple algorithms, the performance trends are different with increasing data scales, and hence we must consider not only variety of data sets but also variety of applications in benchmarking big data systems.Comment: 16 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

Thumbs up? Sentiment Classification using Machine Learning Techniques

Author: Lee Lillian
Pang Bo
Vaithyanathan Shivakumar
Publication venue
Publication date: 01/01/2002
Field of study

We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.Comment: To appear in EMNLP-200

arXiv.org e-Print Archive

CiteSeerX