7 research outputs found
Opinion Mining on Non-English Short Text
As the type and the number of such venues increase, automated analysis of
sentiment on textual resources has become an essential data mining task. In
this paper, we investigate the problem of mining opinions on the collection of
informal short texts. Both positive and negative sentiment strength of texts
are detected. We focus on a non-English language that has few resources for
text mining. This approach would help enhance the sentiment analysis in
languages where a list of opinionated words does not exist. We propose a new
method projects the text into dense and low dimensional feature vectors
according to the sentiment strength of the words. We detect the mixture of
positive and negative sentiments on a multi-variant scale. Empirical evaluation
of the proposed framework on Turkish tweets shows that our approach gets good
results for opinion mining
A Hybrid Statistical Data Pre-processing Approach for Language-Independent Text Classification
Data pre-processing is an important topic in Text Classification (TC). It aims to convert the original textual data in a data-mining-ready structure, where the most significant text-features that serve to differentiate between textcategories are identified. Broadly speaking, textual data pre-processing techniques can be divided into three groups: (i) linguistic, (ii) statistical, and (iii) hybrid (i) & (ii). With regard to language-independent TC, our study relates to the statistical aspect only. The nature of textual data pre-processing includes