27,644 research outputs found
The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems
Now we live in an era of big data, and big data applications are becoming
more and more pervasive. How to benchmark data center computer systems running
big data applications (in short big data systems) is a hot topic. In this
paper, we focus on measuring the performance impacts of diverse applications
and scalable volumes of data sets on big data systems. For four typical data
analysis applications---an important class of big data applications, we find
two major results through experiments: first, the data scale has a significant
impact on the performance of big data systems, so we must provide scalable
volumes of data sets in big data benchmarks. Second, for the four applications,
even all of them use the simple algorithms, the performance trends are
different with increasing data scales, and hence we must consider not only
variety of data sets but also variety of applications in benchmarking big data
systems.Comment: 16 pages, 3 figure
Thumbs up? Sentiment Classification using Machine Learning Techniques
We consider the problem of classifying documents not by topic, but by overall
sentiment, e.g., determining whether a review is positive or negative. Using
movie reviews as data, we find that standard machine learning techniques
definitively outperform human-produced baselines. However, the three machine
learning methods we employed (Naive Bayes, maximum entropy classification, and
support vector machines) do not perform as well on sentiment classification as
on traditional topic-based categorization. We conclude by examining factors
that make the sentiment classification problem more challenging.Comment: To appear in EMNLP-200
- …