11,746 research outputs found
KACST Arabic Text Classification Project: Overview and Preliminary Results
Electronically formatted Arabic free-texts can be found in abundance these days on the World Wide Web, often linked to commercial enterprises and/or government organizations. Vast tracts of knowledge and relations lie hidden within these texts, knowledge that can be exploited once the correct intelligent tools have been identified and applied. For example, text mining may help with text classification and categorization. Text classification aims to automatically assign text to a predefined category based on identifiable linguistic features. Such a process has different useful applications including, but not restricted to, E-Mail spam detection, web pages content filtering, and automatic message routing. In this paper an overview of King Abdulaziz City for Science and Technology (KACST) Arabic Text Classification Project will be illustrated along with some preliminary results. This project will contribute to the better understanding and elaboration of Arabic text classification techniques
Thumbs up? Sentiment Classification using Machine Learning Techniques
We consider the problem of classifying documents not by topic, but by overall
sentiment, e.g., determining whether a review is positive or negative. Using
movie reviews as data, we find that standard machine learning techniques
definitively outperform human-produced baselines. However, the three machine
learning methods we employed (Naive Bayes, maximum entropy classification, and
support vector machines) do not perform as well on sentiment classification as
on traditional topic-based categorization. We conclude by examining factors
that make the sentiment classification problem more challenging.Comment: To appear in EMNLP-200
- …