1,471 research outputs found
Distributed Representations of Sentences and Documents
Many machine learning algorithms require the input to be represented as a
fixed-length feature vector. When it comes to texts, one of the most common
fixed-length features is bag-of-words. Despite their popularity, bag-of-words
features have two major weaknesses: they lose the ordering of the words and
they also ignore semantics of the words. For example, "powerful," "strong" and
"Paris" are equally distant. In this paper, we propose Paragraph Vector, an
unsupervised algorithm that learns fixed-length feature representations from
variable-length pieces of texts, such as sentences, paragraphs, and documents.
Our algorithm represents each document by a dense vector which is trained to
predict words in the document. Its construction gives our algorithm the
potential to overcome the weaknesses of bag-of-words models. Empirical results
show that Paragraph Vectors outperform bag-of-words models as well as other
techniques for text representations. Finally, we achieve new state-of-the-art
results on several text classification and sentiment analysis tasks
TED Talk Recommender Using Speech Transcripts
Nowadays, online video platforms mostly recommend related videos by analyzing
user-driven data such as viewing patterns, rather than the content of the
videos. However, content is more important than any other element when videos
aim to deliver knowledge. Therefore, we have developed a web application which
recommends related TED lecture videos to the users, considering the content of
the videos from the transcripts. TED Talk Recommender constructs a network for
recommending videos that are similar content-wise and providing a user
interface.Comment: 3 page
- …