32,370 research outputs found
Authorship Attribution Using a Neural Network Language Model
In practice, training language models for individual authors is often
expensive because of limited data resources. In such cases, Neural Network
Language Models (NNLMs), generally outperform the traditional non-parametric
N-gram models. Here we investigate the performance of a feed-forward NNLM on an
authorship attribution problem, with moderate author set size and relatively
limited data. We also consider how the text topics impact performance. Compared
with a well-constructed N-gram baseline method with Kneser-Ney smoothing, the
proposed method achieves nearly 2:5% reduction in perplexity and increases
author classification accuracy by 3:43% on average, given as few as 5 test
sentences. The performance is very competitive with the state of the art in
terms of accuracy and demand on test data. The source code, preprocessed
datasets, a detailed description of the methodology and results are available
at https://github.com/zge/authorship-attribution.Comment: Proceedings of the 30th AAAI Conference on Artificial Intelligence
(AAAI'16
PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks
Unsupervised text embedding methods, such as Skip-gram and Paragraph Vector,
have been attracting increasing attention due to their simplicity, scalability,
and effectiveness. However, comparing to sophisticated deep learning
architectures such as convolutional neural networks, these methods usually
yield inferior results when applied to particular machine learning tasks. One
possible reason is that these text embedding methods learn the representation
of text in a fully unsupervised way, without leveraging the labeled information
available for the task. Although the low dimensional representations learned
are applicable to many different tasks, they are not particularly tuned for any
task. In this paper, we fill this gap by proposing a semi-supervised
representation learning method for text data, which we call the
\textit{predictive text embedding} (PTE). Predictive text embedding utilizes
both labeled and unlabeled data to learn the embedding of text. The labeled
information and different levels of word co-occurrence information are first
represented as a large-scale heterogeneous text network, which is then embedded
into a low dimensional space through a principled and efficient algorithm. This
low dimensional embedding not only preserves the semantic closeness of words
and documents, but also has a strong predictive power for the particular task.
Compared to recent supervised approaches based on convolutional neural
networks, predictive text embedding is comparable or more effective, much more
efficient, and has fewer parameters to tune.Comment: KDD 201
Music Similarity Estimation
Music is a complicated form of communication, where creators and culture communicate and expose their individuality. After music digitalization took place, recommendation systems and other online services have become indispensable in the field of Music Information Retrieval (MIR). To build these systems and recommend the right choice of song to the user, classification of songs is required. In this paper, we propose an approach for finding similarity between music based on mid-level attributes like pitch, midi value corresponding to pitch, interval, contour and duration and applying text based classification techniques. Our system predicts jazz, metal and ragtime for western music. The experiment to predict the genre of music is conducted based on 450 music files and maximum accuracy achieved is 95.8% across different n-grams. We have also analyzed the Indian classical Carnatic music and are classifying them based on its raga. Our system predicts Sankarabharam, Mohanam and Sindhubhairavi ragas. The experiment to predict the raga of the song is conducted based on 95 music files and the maximum accuracy achieved is 90.3% across different n-grams. Performance evaluation is done by using the accuracy score of scikit-learn
- …