4,391 research outputs found
PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks
Unsupervised text embedding methods, such as Skip-gram and Paragraph Vector,
have been attracting increasing attention due to their simplicity, scalability,
and effectiveness. However, comparing to sophisticated deep learning
architectures such as convolutional neural networks, these methods usually
yield inferior results when applied to particular machine learning tasks. One
possible reason is that these text embedding methods learn the representation
of text in a fully unsupervised way, without leveraging the labeled information
available for the task. Although the low dimensional representations learned
are applicable to many different tasks, they are not particularly tuned for any
task. In this paper, we fill this gap by proposing a semi-supervised
representation learning method for text data, which we call the
\textit{predictive text embedding} (PTE). Predictive text embedding utilizes
both labeled and unlabeled data to learn the embedding of text. The labeled
information and different levels of word co-occurrence information are first
represented as a large-scale heterogeneous text network, which is then embedded
into a low dimensional space through a principled and efficient algorithm. This
low dimensional embedding not only preserves the semantic closeness of words
and documents, but also has a strong predictive power for the particular task.
Compared to recent supervised approaches based on convolutional neural
networks, predictive text embedding is comparable or more effective, much more
efficient, and has fewer parameters to tune.Comment: KDD 201
Sentiment Analysis using an ensemble of Feature Selection Algorithms
To determine the opinion of any person experiencing any services or buying any product, the usage of Sentiment Analysis, a continuous research in the field of text mining, is a common practice. It is a process of using computation to identify and categorize opinions expressed in a piece of text. Individuals post their opinion via reviews, tweets, comments or discussions which is our unstructured information. Sentiment analysis gives a general conclusion of audits which benefit clients, individuals or organizations for decision making. The primary point of this paper is to perform an ensemble approach on feature reduction methods identified with natural language processing and performing the analysis based on the results. An ensemble approach is a process of combining two or more methodologies. The feature reduction methods used are Principal Component Analysis (PCA) for feature extraction and Pearson Chi squared statistical test for feature selection. The fundamental commitment of this paper is to experiment whether combined use of cautious feature determination and existing classification methodologies can yield better accuracy
- …